Why Vector Databases for AI?
Every RAG pipeline, semantic search, and recommendation system needs vector storage:
User Query β Embedding Model β Vector Search β Top-K Results β LLM ContextThe vector database stores millions of embeddings and returns the nearest neighbors in milliseconds.
Quick Comparison
| Feature | Qdrant | Milvus | pgvector |
|---|---|---|---|
| Architecture | Rust, purpose-built | Go/C++, distributed | PostgreSQL extension |
| Scaling | Horizontal sharding | Distributed native | Single node (+ Citus) |
| Max vectors | Billions | Billions | ~10M practical |
| Query speed (1M, 768d) | 2ms | 3ms | 15ms |
| Memory mode | Disk + memory-mapped | Tiered storage | Shared buffers |
| Filtering | Native payload filters | Attribute filtering | SQL WHERE |
| Kubernetes | Helm chart, operator | Helm chart, operator | Any PG operator |
| License | Apache 2.0 | Apache 2.0 | PostgreSQL |
| Managed service | Qdrant Cloud | Zilliz Cloud | Neon, Supabase, RDS |
Qdrant on Kubernetes
Qdrant is a Rust-built vector database optimized for speed and reliability.
Helm Deployment
helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm install qdrant qdrant/qdrant \
--namespace vector-db \
--create-namespace \
--set replicaCount=3 \
--set persistence.size=100Gi \
--set resources.limits.memory=16Gi \
--set config.storage.performance.optimizer_cpu_budget=4Production Configuration
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: qdrant
spec:
replicas: 3
template:
spec:
containers:
- name: qdrant
image: qdrant/qdrant:v1.12.0
ports:
- containerPort: 6333 # REST
- containerPort: 6334 # gRPC
resources:
requests:
memory: "8Gi"
cpu: "4"
limits:
memory: "16Gi"
cpu: "8"
volumeMounts:
- name: storage
mountPath: /qdrant/storage
volumeClaimTemplates:
- metadata:
name: storage
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
storageClassName: gp3Create Collection and Index
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, OptimizersConfigDiff
client = QdrantClient(host="qdrant.vector-db.svc", port=6333)
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1536, # OpenAI ada-002 dimensions
distance=Distance.COSINE,
),
optimizers_config=OptimizersConfigDiff(
indexing_threshold=20000, # Build HNSW after 20K vectors
memmap_threshold=50000, # Memory-map after 50K vectors
),
shard_number=3, # Distribute across 3 nodes
replication_factor=2, # 2 copies for HA
)Qdrant Strengths
- Fastest queries β Rust + HNSW = sub-millisecond at scale
- Payload filtering β filter by metadata without post-filtering
- Memory-mapped storage β handle datasets larger than RAM
- Snapshot + WAL β point-in-time recovery
- Quantization β scalar and product quantization for 4x memory reduction
Milvus on Kubernetes
Milvus is a cloud-native distributed vector database designed for billion-scale deployments.
Helm Deployment
helm repo add milvus https://zilliz.com/milvus-helm
helm install milvus milvus/milvus \
--namespace vector-db \
--set cluster.enabled=true \
--set etcd.replicaCount=3 \
--set minio.mode=distributed \
--set pulsar.enabled=true \
--set queryNode.replicas=3Architecture
ββββββββββββββββββββββββββββββββββββββββββββ
β Milvus Cluster β
β β
β βββββββββββ ββββββββββ βββββββββββ β
β β Proxy β β Coord β β Coord β β
β β (Load β β (Query)β β (Data) β β
β β Balance)β ββββββββββ βββββββββββ β
β ββββββ¬βββββ β
β β ββββββββββββββββββββββββ β
β βββββββΆβ Query Nodes (3) β β
β β ββββββββββββββββββββββββ β
β β ββββββββββββββββββββββββ β
β βββββββΆβ Data Nodes (3) β β
β ββββββββββββββββββββββββ β
β β
β ββββββββ ββββββββββ βββββββββββββ β
β β etcd β β MinIO β β Pulsar β β
β ββββββββ ββββββββββ βββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββMilvus Strengths
- True distributed β scales to billions of vectors
- Multi-index β IVF, HNSW, DiskANN, GPU-IVF
- GPU acceleration β NVIDIA GPU for index building and search
- Tiered storage β hot/warm/cold data management
- Multi-vector β store and search multiple vector fields
Milvus Considerations
- Complex deployment β requires etcd, MinIO, Pulsar
- Resource heavy β minimum 3 nodes recommended for production
- Operational overhead β more components to monitor and maintain
pgvector: PostgreSQL Extension
For teams already running PostgreSQL, pgvector adds vector search without a new database:
Installation
CREATE EXTENSION vector;
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536), -- OpenAI dimensions
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- HNSW index (recommended for most use cases)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);Kubernetes with CloudNativePG
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: vectordb
spec:
instances: 3
postgresql:
parameters:
shared_buffers: "4GB"
effective_cache_size: "12GB"
maintenance_work_mem: "2GB"
max_parallel_workers_per_gather: "4"
shared_preload_libraries:
- "vector"
storage:
size: 200Gi
storageClass: gp3pgvector Strengths
- Zero new infrastructure β extension to existing PostgreSQL
- SQL joins β combine vector search with relational queries
- ACID transactions β vector updates are transactional
- Existing tooling β pgdump, replication, monitoring all work
- Hybrid search β full-text search + vector search in one query
pgvector Limitations
- Single-node scaling β practical limit ~5-10M vectors
- No sharding β Citus adds complexity
- Slower queries β 5-10x slower than purpose-built solutions at scale
- Memory-bound β needs enough RAM for HNSW graph
Performance Benchmarks
1M vectors, 1536 dimensions, top-10 search, single node:
| Database | QPS | P50 Latency | P99 Latency | Memory |
|---|---|---|---|---|
| Qdrant | 12,500 | 1.8ms | 4.2ms | 6.2GB |
| Milvus | 9,800 | 2.5ms | 6.1ms | 7.8GB |
| pgvector (HNSW) | 3,200 | 8.5ms | 22ms | 9.1GB |
At 100M vectors (distributed):
| Database | QPS | P99 Latency | Nodes |
|---|---|---|---|
| Qdrant (3 nodes) | 35,000 | 12ms | 3 |
| Milvus (5 nodes) | 45,000 | 15ms | 5 |
| pgvector | N/A (single-node limit) | - | - |
Decision Framework
Choose Qdrant when:
- β Need fastest possible query latency
- β Dataset under 100M vectors
- β Want simple deployment (single binary)
- β Need payload filtering in searches
- β Rust ecosystem preference
Choose Milvus when:
- β Billion-scale vector datasets
- β Need GPU-accelerated index building
- β Multi-vector search requirements
- β Enterprise distributed requirements
- β Team has operational capacity for complex stack
Choose pgvector when:
- β Already running PostgreSQL
- β Dataset under 5-10M vectors
- β Need SQL joins with vector search
- β Want minimal operational complexity
- β Prototype or early-stage AI product