The difference between "vector search works" and "vector search is production-ready" often comes down to three things: metadata filtering, hybrid search, and operational fit. Relational databases excel at exact matches; they were not built for high-dimensional similarity. Vector databases store embeddings and support approximate nearest-neighbor (ANN) search—cosine similarity, Euclidean distance, algorithms like HNSW and IVF—at scale. But not all vector databases are alike, and the right choice depends on your workload, your ops tolerance, and whether you need keyword search alongside semantic search.

Through building RAG systems for enterprise clients, we've evaluated several options and landed on a tiered strategy. Here's what we learned.

How we got here

We started with pgvector. It was the obvious choice: PostgreSQL extension, no new infrastructure, familiar tooling. For small corpora—under roughly 100K vectors—it worked fine. Past that, latency degraded. We hit the limits of what a general-purpose relational engine can do for high-dimensional similarity search. That pushed us to evaluate dedicated vector databases.

We looked at Pinecone (managed, zero-ops), Weaviate (hybrid search native), and Milvus (scale and open-source). Each trades off differently on ops burden, cost, and features. The choice depends on how much you want to manage, how large your corpus will grow, and whether hybrid search—combining semantic and keyword matching—is core to your use case.

What vector databases actually do

Vector databases store embeddings and support ANN search. They differ in how they index (HNSW, IVF, DiskANN), how they handle metadata filtering, and whether they integrate keyword search. Relational databases can store vectors and run similarity queries, but they are optimized for exact matches and structured filters, not for finding the nearest neighbors in a 1536-dimensional space at millisecond latency.

One lesson we learned early: metadata filtering is non-negotiable for production RAG. Pure vector search without filters is too blunt. You almost always need to scope queries—"engineering docs only," "last 30 days," "this tenant's data." A database that supports efficient metadata filtering at query time is table stakes. Verify that your chosen option does this well before you commit.

Pinecone: managed, zero-ops

Pinecone is fully managed. Create an index, ingest vectors, query. No servers, no capacity planning. For startups and small teams, it offers minimal ops and fast iteration. Queries typically land under 100ms; at 1M vectors (1536 dimensions), expect costs in the low hundreds of dollars per month. The tradeoff: cost scales with vector count. At 10M+ vectors, the bill becomes prohibitive for many teams, and we've seen migrations off Pinecone when that threshold is crossed.

Pinecone supports metadata filtering and hybrid search via a separate keyword index. If you need both semantic and keyword search, you'll need to maintain and merge results from two indexes. For many applications, that's acceptable; for others, a single system that does both is preferable.

Weaviate: hybrid search native

Weaviate combines vector search and traditional full-text search in one system. Document-oriented schema; vectors are one property among many. When hybrid search is core—technical docs, support knowledge bases, product names where exact matches matter—Weaviate simplifies the architecture. No separate full-text index, no custom merge logic. In-place embedding via API at ingest time reduces pipeline complexity.

In our benchmarks at 1M vectors, Weaviate returned results in 10–15ms versus Pinecone's 5–8ms. The gap is small; for most applications, both are fast enough. The deciding factor is whether you need hybrid search as a first-class feature. If yes, Weaviate is a strong fit. Self-hosted or Weaviate Cloud—your choice based on ops preference.

Milvus: scale and open-source

Milvus is built for scale. Ingestion throughput of 20K+ vectors per second; single-digit millisecond p50 latency at 1M vectors; support for billions of vectors. At 100M+ vectors, you pay for hardware, not per-vector. For large-scale deployments where cost matters, Milvus and similar open-source options are worth evaluating.

The tradeoff is operational complexity. Self-hosting requires etcd, MinIO or S3, and optionally Pulsar or Kafka. The learning curve is steeper than Pinecone. You need in-house expertise or a dedicated platform team. Don't over-provision with Milvus for 100K vectors; don't under-provision with Pinecone for 50M. Match the database to the workload.

Given the landscape, we recommend:

Start with managed if you can. Pinecone or Weaviate Cloud reduce operational burden. Optimize for iteration speed early; migrate later if cost or scale demands it.

Require metadata filtering. Production RAG almost always needs it—filter by source, date, tenant, and so on. Verify your chosen database supports it efficiently before you build on top of it.

Benchmark at your scale. Vendor benchmarks use specific configurations. Run your own at 1x and 10x your expected vector count. Latency and throughput can degrade nonlinearly as you grow.

Plan for hybrid search if your content is keyword-heavy. Technical documentation, support KBs, product names—keyword matching often complements semantic search. Weaviate has it built in; others require a separate full-text index and merge logic.

The Meterra approach

We've landed on a tiered strategy. For new projects and small corpora (under 1M vectors), we use Pinecone—fast to ship, minimal ops. For applications where hybrid search is critical, we use Weaviate. For large-scale deployments (10M+ vectors) where cost matters, we evaluate Milvus or similar open-source options.

The key is matching the database to the workload. At the margins, the difference between "vector search works" and "vector search is production-ready" comes down to filtering, hybrid search, and operational fit. Choose based on your scale and ops tolerance—then measure.

About Meterra

Meterra is an AI & software development company specializing in custom AI agents, LLM integration, custom software, and cloud-native infrastructure. We build production-ready systems for startups, SMBs, and enterprises—from RAG pipelines and agentic workflows to Kubernetes and multi-cloud operations.

Learn more Contact us

Continue reading

Mar 13, 2026AICloud

The Atomic Unit of Intelligence: How Tokens Define Economy

Mar 9, 2026AIAgentEnterprise

How we got here

What vector databases actually do

Pinecone: managed, zero-ops

Weaviate: hybrid search native

Milvus: scale and open-source

The Meterra approach

About Meterra

Continue reading

The Atomic Unit of Intelligence: How Tokens Define Economy

The Software Stack of an AI-Native Company

Find your best next step

How we got here

What vector databases actually do

Pinecone: managed, zero-ops

Weaviate: hybrid search native

Milvus: scale and open-source

What we recommend

The Meterra approach

About Meterra

Continue reading

The Atomic Unit of Intelligence: How Tokens Define Economy

The Software Stack of an AI-Native Company

Find your best next step