Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
Vector Databases on Kubernetes: Qdrant vs Milvus vs pgvector
AI

Vector Databases on Kubernetes: Qdrant vs Milvus vs pgvector

Deploy and compare vector databases on Kubernetes for AI applications. Performance benchmarks, scaling patterns, and production configuration for RAG and semantic search.

LB
Luca Berton
Β· 3 min read

Why Vector Databases for AI?

Every RAG pipeline, semantic search, and recommendation system needs vector storage:

User Query β†’ Embedding Model β†’ Vector Search β†’ Top-K Results β†’ LLM Context

The vector database stores millions of embeddings and returns the nearest neighbors in milliseconds.

Quick Comparison

FeatureQdrantMilvuspgvector
ArchitectureRust, purpose-builtGo/C++, distributedPostgreSQL extension
ScalingHorizontal shardingDistributed nativeSingle node (+ Citus)
Max vectorsBillionsBillions~10M practical
Query speed (1M, 768d)2ms3ms15ms
Memory modeDisk + memory-mappedTiered storageShared buffers
FilteringNative payload filtersAttribute filteringSQL WHERE
KubernetesHelm chart, operatorHelm chart, operatorAny PG operator
LicenseApache 2.0Apache 2.0PostgreSQL
Managed serviceQdrant CloudZilliz CloudNeon, Supabase, RDS

Qdrant on Kubernetes

Qdrant is a Rust-built vector database optimized for speed and reliability.

Helm Deployment

helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm install qdrant qdrant/qdrant \
  --namespace vector-db \
  --create-namespace \
  --set replicaCount=3 \
  --set persistence.size=100Gi \
  --set resources.limits.memory=16Gi \
  --set config.storage.performance.optimizer_cpu_budget=4

Production Configuration

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: qdrant
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: qdrant
          image: qdrant/qdrant:v1.12.0
          ports:
            - containerPort: 6333  # REST
            - containerPort: 6334  # gRPC
          resources:
            requests:
              memory: "8Gi"
              cpu: "4"
            limits:
              memory: "16Gi"
              cpu: "8"
          volumeMounts:
            - name: storage
              mountPath: /qdrant/storage
  volumeClaimTemplates:
    - metadata:
        name: storage
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi
        storageClassName: gp3

Create Collection and Index

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, OptimizersConfigDiff

client = QdrantClient(host="qdrant.vector-db.svc", port=6333)

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,  # OpenAI ada-002 dimensions
        distance=Distance.COSINE,
    ),
    optimizers_config=OptimizersConfigDiff(
        indexing_threshold=20000,  # Build HNSW after 20K vectors
        memmap_threshold=50000,   # Memory-map after 50K vectors
    ),
    shard_number=3,  # Distribute across 3 nodes
    replication_factor=2,  # 2 copies for HA
)

Qdrant Strengths

  • Fastest queries β€” Rust + HNSW = sub-millisecond at scale
  • Payload filtering β€” filter by metadata without post-filtering
  • Memory-mapped storage β€” handle datasets larger than RAM
  • Snapshot + WAL β€” point-in-time recovery
  • Quantization β€” scalar and product quantization for 4x memory reduction

Milvus on Kubernetes

Milvus is a cloud-native distributed vector database designed for billion-scale deployments.

Helm Deployment

helm repo add milvus https://zilliz.com/milvus-helm
helm install milvus milvus/milvus \
  --namespace vector-db \
  --set cluster.enabled=true \
  --set etcd.replicaCount=3 \
  --set minio.mode=distributed \
  --set pulsar.enabled=true \
  --set queryNode.replicas=3

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Milvus Cluster              β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Proxy  β”‚  β”‚  Coord β”‚  β”‚  Coord  β”‚  β”‚
β”‚  β”‚ (Load   β”‚  β”‚ (Query)β”‚  β”‚ (Data)  β”‚  β”‚
β”‚  β”‚ Balance)β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                            β”‚
β”‚       β”‚      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚       β”œβ”€β”€β”€β”€β”€β–Άβ”‚    Query Nodes (3)   β”‚   β”‚
β”‚       β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚       β”‚      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚       └─────▢│    Data Nodes (3)    β”‚   β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ etcd β”‚  β”‚ MinIO  β”‚  β”‚  Pulsar   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Milvus Strengths

  • True distributed β€” scales to billions of vectors
  • Multi-index β€” IVF, HNSW, DiskANN, GPU-IVF
  • GPU acceleration β€” NVIDIA GPU for index building and search
  • Tiered storage β€” hot/warm/cold data management
  • Multi-vector β€” store and search multiple vector fields

Milvus Considerations

  • Complex deployment β€” requires etcd, MinIO, Pulsar
  • Resource heavy β€” minimum 3 nodes recommended for production
  • Operational overhead β€” more components to monitor and maintain

pgvector: PostgreSQL Extension

For teams already running PostgreSQL, pgvector adds vector search without a new database:

Installation

CREATE EXTENSION vector;

CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1536),  -- OpenAI dimensions
    metadata JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- HNSW index (recommended for most use cases)
CREATE INDEX ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

Kubernetes with CloudNativePG

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: vectordb
spec:
  instances: 3
  postgresql:
    parameters:
      shared_buffers: "4GB"
      effective_cache_size: "12GB"
      maintenance_work_mem: "2GB"
      max_parallel_workers_per_gather: "4"
    shared_preload_libraries:
      - "vector"
  storage:
    size: 200Gi
    storageClass: gp3

pgvector Strengths

  • Zero new infrastructure β€” extension to existing PostgreSQL
  • SQL joins β€” combine vector search with relational queries
  • ACID transactions β€” vector updates are transactional
  • Existing tooling β€” pgdump, replication, monitoring all work
  • Hybrid search β€” full-text search + vector search in one query

pgvector Limitations

  • Single-node scaling β€” practical limit ~5-10M vectors
  • No sharding β€” Citus adds complexity
  • Slower queries β€” 5-10x slower than purpose-built solutions at scale
  • Memory-bound β€” needs enough RAM for HNSW graph

Performance Benchmarks

1M vectors, 1536 dimensions, top-10 search, single node:

DatabaseQPSP50 LatencyP99 LatencyMemory
Qdrant12,5001.8ms4.2ms6.2GB
Milvus9,8002.5ms6.1ms7.8GB
pgvector (HNSW)3,2008.5ms22ms9.1GB

At 100M vectors (distributed):

DatabaseQPSP99 LatencyNodes
Qdrant (3 nodes)35,00012ms3
Milvus (5 nodes)45,00015ms5
pgvectorN/A (single-node limit)--

Decision Framework

Choose Qdrant when:

  • βœ… Need fastest possible query latency
  • βœ… Dataset under 100M vectors
  • βœ… Want simple deployment (single binary)
  • βœ… Need payload filtering in searches
  • βœ… Rust ecosystem preference

Choose Milvus when:

  • βœ… Billion-scale vector datasets
  • βœ… Need GPU-accelerated index building
  • βœ… Multi-vector search requirements
  • βœ… Enterprise distributed requirements
  • βœ… Team has operational capacity for complex stack

Choose pgvector when:

  • βœ… Already running PostgreSQL
  • βœ… Dataset under 5-10M vectors
  • βœ… Need SQL joins with vector search
  • βœ… Want minimal operational complexity
  • βœ… Prototype or early-stage AI product

Free 30-min AI & Cloud consultation

Book Now