Tuning HNSW vs IVFFlat Indexes in pgvector
You have vectors in PostgreSQL and a working query, but recall is inconsistent, build times are painful, or the planner keeps falling back to a sequential scan. The fix is in the index parameters: HNSW’s m, ef_construction, and ef_search, or IVFFlat’s lists and probes. Each trades build time, memory, and recall against query latency, and the right setting depends on dataset size and write pattern. This guide tunes both index types and verifies the planner actually uses them. It assumes the schema and extension setup from the pgvector implementation guide — that setup is not repeated here. Both live within vector search integration strategies and the architecture frameworks under Search Engine Selection & Architecture. If your dimension or model is still unsettled, fix that first via choosing an embedding model for search, since changing it forces a full reindex.
Prerequisites
- pgvector 0.5.0+ (HNSW requires 0.5.0;
vector_ip_opsHNSW requires 0.5.1). - A populated
vector(N)column and a chosen distance operator (<=>cosine,<->L2,<#>inner product). - A labeled query set to measure recall against an exact (no-index) baseline.
maintenance_work_memraised for index builds (HNSW especially).
Diagnosis: When Each Index Applies
IVFFlat partitions vectors into lists clusters via k-means and, at query time, scans only the nearest probes clusters. It builds fast and uses little memory, but recall depends on the data being well clustered, and — critically — it must be rebuilt after large inserts because the centroids drift. It suits large, mostly static corpora where build speed matters.
HNSW builds a layered proximity graph. It delivers higher recall at a given latency and degrades gracefully as you insert rows (no centroid drift), but it builds slowly and consumes far more memory. It suits dynamic workloads and recall-sensitive queries — the default choice unless build time or memory is the binding constraint.
A common symptom that sends people tuning is recall that silently drops on a growing IVFFlat index:
-- IVFFlat built when the table had 50k rows, now holds 2M
recall@10 = 0.71 -- was 0.94 at build time; centroids no longer represent the data
The other frequent symptom is the planner ignoring the index entirely, which no parameter will fix until you diagnose it with EXPLAIN.
The three-way trade among build time, memory, and recall is the core of the decision. HNSW build time scales with m and ef_construction and can run minutes-to-hours on a multi-million-row table, and its graph roughly doubles the on-disk footprint of the raw vectors; in exchange it answers at high recall with a small ef_search and tolerates continuous inserts. IVFFlat builds in a fraction of that time because k-means clustering is cheap relative to graph construction, and it adds almost no memory beyond the centroid table, but it pays for that at query time — recall is sensitive to probes, and it erodes as the table grows past its build-time row count. As a rule of thumb: reach for HNSW when the workload is read-latency-sensitive and the table is written continuously; reach for IVFFlat when the corpus is large, mostly static, and you need the index to build quickly inside a maintenance window. If you are unsure, build HNSW first and only fall back to IVFFlat when build time or memory pressure proves it necessary.
Both indexes expose exactly one query-time recall knob, and treating it as a per-transaction setting rather than a global one is what keeps a pool healthy. hnsw.ef_search widens the graph search frontier; ivfflat.probes scans more clusters. Raising either lifts recall and latency together, monotonically, so the tuning task is a one-dimensional sweep: find the smallest value that clears your recall target and stop. The mistake is to set it high “to be safe” — that tax is paid on every query, including the cheap ones that did not need it.
Solution Steps
1. Build an HNSW index with tuned construction parameters
m is the number of graph connections per node; ef_construction is the candidate-list size during build. Higher values raise recall and build cost.
-- HNSW: m=16 is a solid default; raise to 32 for >1M rows or high dimensions
SET maintenance_work_mem = '2GB'; -- avoid spilling the build to disk
CREATE INDEX idx_emb_hnsw
ON search_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
2. Build an IVFFlat index with a sized list count
The accepted heuristic is lists = rows / 1000 up to ~1M rows, then sqrt(rows) beyond. Too few lists means coarse clusters; too many means each probe scans little and recall falls unless you raise probes.
-- IVFFlat for a 2M-row static table: sqrt(2_000_000) ≈ 1414
SET maintenance_work_mem = '1GB';
CREATE INDEX idx_emb_ivf
ON search_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1414);
3. Tune recall at query time with SET LOCAL
Both algorithms expose a query-time knob. Raise it to trade latency for recall on a per-transaction basis without rebuilding. Use SET LOCAL so the change is scoped to the current transaction and never leaks into the pool.
-- HNSW: ef_search must be >= LIMIT; higher = more recall, more latency
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id, 1 - (embedding <=> $1) AS score
FROM search_embeddings
ORDER BY embedding <=> $1
LIMIT 10;
COMMIT;
-- IVFFlat: probes is how many lists to scan; 1 is the (low-recall) default
BEGIN;
SET LOCAL ivfflat.probes = 20; -- ~sqrt(lists) is a good starting point
SELECT id, embedding <=> $1 AS distance
FROM search_embeddings
ORDER BY embedding <=> $1
LIMIT 10;
COMMIT;
4. Sweep the query-time parameter against recall
Run your labeled queries at increasing ef_search / probes and stop at the smallest value that meets your recall target — anything higher just adds latency.
# Find the minimal probes/ef_search that hits recall@10 >= 0.95
import psycopg2
def measure(conn, knob_sql, queries, relevant, k=10):
hits = 0
with conn.cursor() as cur:
for qid, vec in queries.items():
cur.execute("BEGIN")
cur.execute(knob_sql)
cur.execute(
"SELECT id FROM search_embeddings ORDER BY embedding <=> %s LIMIT %s",
(vec, k),
)
got = {r[0] for r in cur.fetchall()}
cur.execute("COMMIT")
if relevant[qid] & got:
hits += 1
return hits / len(queries)
conn = psycopg2.connect("dbname=search")
for p in (5, 10, 20, 40):
r = measure(conn, f"SET LOCAL ivfflat.probes = {p}", QUERIES, RELEVANT)
print(f"probes={p}: recall@10={r:.3f}")
Verification
Confirm the planner uses the index — an Index Scan using idx_emb_hnsw line, not a Seq Scan. The ORDER BY ... LIMIT must match the index’s operator class exactly or Postgres silently falls back.
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM search_embeddings
ORDER BY embedding <=> $1
LIMIT 10;
Expected plan shows the index in use:
Limit (cost=... rows=10)
-> Index Scan using idx_emb_hnsw on search_embeddings
Order By: (embedding <=> $1)
Buffers: shared hit=... read=...
Planning Time: 0.2 ms
Execution Time: 3.4 ms
Then confirm the index is actually being hit over time:
-- idx_scan should climb as queries run; if it stays 0, the planner isn't using it
SELECT indexrelid::regclass AS index, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE indexrelid::regclass::text IN ('idx_emb_hnsw', 'idx_emb_ivf');
Common Pitfalls
Operator class mismatch forces a sequential scan
An index built with vector_cosine_ops is invisible to a query that orders by <-> (L2) or <#> (inner product). The ORDER BY operator must match the index’s operator class. Run EXPLAIN after any query change; a Seq Scan where you expected an Index Scan almost always means the operators disagree.
IVFFlat recall decays as the table grows
IVFFlat centroids are computed once at build time, so heavy inserts after the build leave clusters that no longer represent the data and recall slides. Rebuild the index periodically (or REINDEX INDEX CONCURRENTLY) after large ingests, or switch to HNSW, which has no centroids to drift.
Setting ef_search or probes below LIMIT or session-wide
hnsw.ef_search must be at least your LIMIT, or you cannot retrieve k results at full recall. And setting these knobs without LOCAL leaks the value across a pooled connection, inflating latency for unrelated queries. Always scope tuning with SET LOCAL inside the query transaction.
Related
- Vector Search Integration Strategies — the parent area covering hybrid retrieval and pipeline design.
- Choosing an embedding model for search — settle dimension and metric before building an index, since both force a reindex.
- Ranking Algorithms & Relevance Tuning — combine ANN recall with lexical scoring for hybrid relevance.