Query Autocomplete & Suggestions

Autocomplete is the highest-traffic surface in any search product: it fires on nearly every keystroke, runs against the full corpus, and must answer in single-digit milliseconds or the dropdown feels broken. The engineering decision this guide resolves is which suggestion mechanism to back the dropdown with — an edge-ngram analyzer, Elasticsearch’s completion suggester (an in-memory finite-state transducer), or the search_as_you_type field type — and how to layer typo tolerance, popularity weighting, and query-log mining on top without blowing your latency or memory budget. It sits under Search Frontend & UX Patterns, and pairs tightly with the broader search-as-you-type interfaces work that governs debouncing, request cancellation, and result rendering. Get the index-side data structure wrong and no amount of frontend polish recovers the latency.

The three mechanisms are not interchangeable. They differ in what they match (prefixes of terms vs prefixes of whole phrases), where the matching structure lives (on disk in the inverted index vs in heap as an FST), and how much index bloat they cost. This page treats them as engineering tradeoffs, not features, and gives you the configuration to ship each one.

A useful framing before the detail: autocomplete answers come in two distinct shapes, and the mechanism follows from the shape. The first shape is query completion — “finish the search phrase the user is typing,” drawn from a curated list of query strings, ranked by how often real users search them. That is the completion suggester’s home turf. The second shape is document prefixing — “show me the live records whose title or name starts with this input,” which must respect availability filters, locale, and scoring. That is where edge-ngrams and search_as_you_type belong. Many products want both at once: a query-completion row at the top of the dropdown and a few document hits below it. Building that means running two queries (or a multi_search) and merging client-side, which is fine — just keep the two suggestion sources mechanically separate rather than forcing one structure to do both jobs poorly.

Prerequisites

Elasticsearch 8.x or OpenSearch 2.x reachable at localhost:9200 with at least 2GB heap free for FST caching.
An index whose mapping you control (autocomplete fields must be declared at create time; you cannot retrofit a completion field without reindexing).
A baseline understanding of analyzers, tokenizers, and the inverted index — see Elasticsearch fundamentals for engineers if custom analyzer chains are unfamiliar.
A source of suggestion weights: a popularity column, a click counter, or an aggregated query log. Unweighted suggestions rank alphabetically, which users read as random.
curl and jq for verification steps.

Concept Deep-Dive

Autocomplete reduces to one question: given a partial input string, return the top N completions, ranked. The three Elasticsearch mechanisms answer it with different data structures.

Edge-ngrams tokenize each term, then emit every prefix as its own token at index time. The term search becomes the tokens s, se, sea, sear, searc, search. Those tokens are written into the ordinary inverted index. At query time you run a plain match against that field with the user’s raw input, so sea finds the document because sea is a literal token in the index. The matching structure is the inverted index itself — it lives on disk, scales to any corpus size, and supports full scoring, filtering, and field collapsing. The cost is index bloat (every term explodes into N prefix tokens) and the discipline of using a different analyzer at search time than at index time, or the user’s sea input would itself be edge-ngrammed into s, se, sea and match far too much.

The completion suggester takes the opposite approach. At index time it builds a finite-state transducer (FST) — a compact, in-memory automaton that maps input prefixes directly to weighted outputs. There is no scoring, no match, no inverted-index lookup. A prefix walk over the FST returns the top-weighted completions in near-constant time, which is why it is the fastest option for raw prefix lookups. The FST stores whole-input strings (it suggests phrases, not term prefixes), supports per-suggestion weight, payloads, contexts (category/geo filters), and a built-in fuzzy mode. The catch: the FST lives in heap and must be loaded per shard. On a large suggestion corpus it consumes real memory, it has no concept of match against document fields, and edits require a refresh to rebuild affected FST segments.

search_as_you_type is a convenience field type that, under the hood, generates a small family of subfields with shingle and edge-ngram analyzers (._2gram, ._3gram, ._index_prefix). You query it with a multi_match of type bool_prefix. It is the lowest-effort way to get phrase-prefix matching that also respects ordinary scoring and filtering, because it is still backed by the inverted index. It is the right default when suggestions are really “show me documents whose title starts with what I typed” rather than “show me curated query strings.”

A worked comparison: a user types was. With edge-ngrams over a title field, you get every document whose title contains a word beginning with was (washington, washer, wasabi), scored by BM25 and weight. With the completion suggester over a curated query list, you get the top-weighted query strings that start with was (washing machine, washington dc flights), in FST order, with no per-document scoring. With search_as_you_type you get documents whose title phrase begins with was plus partial last-word matching, with full filtering. Same input, three different answer shapes — choose the mechanism by the shape you want.

The three mechanisms line up against the dimensions that actually drive the choice:

Dimension	Edge-ngram	Completion suggester	`search_as_you_type`
Matches	term prefixes in text	whole-input prefixes	phrase prefixes + last-word
Structure	inverted index (disk)	FST (heap)	inverted index (disk)
Scoring & filters	full BM25, filters, collapse	none (weight only)	full BM25, filters, collapse
Typo tolerance	via `fuzzy` query (costly)	built-in `fuzzy` block	via `fuzziness` on `multi_match`
Memory cost	larger index, low heap	heap scales with unique inputs	~3 subfields, low heap
Best for	prefixes inside long text	curated weighted query lists	document-title prefixing

De-duplication is a cross-cutting concern for all three. Edge-ngram and search_as_you_type return documents, so two products named “iPhone 15” produce two rows; collapse them with field collapsing (collapse: { field: "title.keyword" }) or a terms aggregation. The completion suggester has built-in skip_duplicates: true, which suppresses identical completion strings within a single lookup. De-duplication is not free: skip_duplicates makes the suggester gather more candidates than size and discard collisions, which raises per-shard work, and collapse on a high-cardinality keyword field adds a memory cost. On small suggestion corpora the overhead is invisible; on large ones, measure it.

Personalization layers on top of any of the three. The cheapest form is the completion suggester’s category contexts — attach a locale, category, or coarse user segment to each suggestion and pass a matching context (optionally boosted) at query time, so one index serves many audiences without per-user indices. Finer personalization (per-user recency, click history) is better applied as a re-rank after the suggester returns its candidates: fetch the top 20 by popularity, then reorder the dropdown client-side or in a thin service using the signals you already hold. Pushing per-user state into the index itself rarely pays for the cardinality explosion it causes.

Suggestion sourcing is the quiet quality lever. Catalog-derived suggestions (titles, brand names) are easy but mirror your data, not your users’ intent — they will happily suggest products nobody searches for. Query-log-derived suggestions reflect demand directly and self-prune dead phrases as traffic shifts, but require a feedback loop: aggregate the log, drop zero-result and low-frequency queries, normalize casing and whitespace, and reload on a schedule. The best dropdowns blend both — query-log strings for the completion row, catalog documents for the result rows — which is why the two-shape framing above matters in practice.

Step-by-Step Implementation

1. Stand up an edge-ngram analyzer

Define a custom analyzer that edge-ngrams at index time and a plain analyzer for search. The asymmetry is the whole point — full detail lives in prefix autocomplete with edge-ngrams.

PUT localhost:9200/products
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": { "type": "edge_ngram", "min_gram": 2, "max_gram": 20 }
      },
      "analyzer": {
        "autocomplete_index":  { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "autocomplete_filter"] },
        "autocomplete_search": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase"] }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete_index",
        "search_analyzer": "autocomplete_search"
      }
    }
  }
}

Verify: confirm the index analyzer explodes a term into prefixes but the search analyzer does not.

curl -s 'localhost:9200/products/_analyze' -H 'Content-Type: application/json' \
  -d '{"analyzer":"autocomplete_index","text":"search"}' | jq '.tokens[].token'
# => "se", "sea", "sear", "searc", "search"

2. Add a completion suggester field

For curated, weighted query suggestions, declare a completion field. It builds the FST automatically.

PUT localhost:9200/suggestions
{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion",
        "analyzer": "simple",
        "preserve_separators": true,
        "preserve_position_increments": true,
        "max_input_length": 50
      }
    }
  }
}

Index a few weighted suggestions. The weight drives FST ordering directly.

curl -s -X POST 'localhost:9200/suggestions/_doc?refresh' -H 'Content-Type: application/json' \
  -d '{"suggest":{"input":["washing machine","washer dryer"],"weight":80}}'

Verify: a prefix query returns weight-ordered completions.

curl -s 'localhost:9200/suggestions/_search' -H 'Content-Type: application/json' -d '{
  "suggest": { "s": { "prefix": "was", "completion": { "field": "suggest", "size": 5, "skip_duplicates": true } } }
}' | jq '.suggest.s[0].options[].text'

3. Add fuzzy / typo-tolerant suggestions

The completion suggester accepts an inline fuzzy block. Fuzziness is bounded by Damerau-Levenshtein edit distance, with a configurable prefix length that must match exactly before fuzziness kicks in — this keeps the FST walk cheap.

curl -s 'localhost:9200/suggestions/_search' -H 'Content-Type: application/json' -d '{
  "suggest": { "s": { "prefix": "wasing", "completion": {
    "field": "suggest", "size": 5,
    "fuzzy": { "fuzziness": "AUTO", "prefix_length": 1, "transpositions": true }
  } } }
}'

Verify: the typo wasing still surfaces washing machine. If it does not, your prefix_length is too high or the FST does not contain that input.

4. Layer in `search_as_you_type` for document-backed prefixing

When suggestions should be live documents (with filtering and scoring), use the field type and a bool_prefix query.

curl -s 'localhost:9200/products/_search' -H 'Content-Type: application/json' -d '{
  "size": 8,
  "query": { "multi_match": {
    "query": "wash", "type": "bool_prefix",
    "fields": ["title_sayt", "title_sayt._2gram", "title_sayt._3gram"]
  } },
  "collapse": { "field": "title.keyword" }
}'

Verify: results include documents whose title phrase begins with wash, de-duplicated by exact title via collapse.

5. Mine the query log for popularity weighting

The strongest suggestions come from real queries, not the catalog. Aggregate the search log, threshold by frequency, and bulk-load into the completion index with frequency as weight. Boosting recency or popularity here is the same family of problem covered in query-time boosting strategies.

// Roll up the last 30 days of queries and reload the suggestion FST.
const rollup = await db.query(`
  SELECT lower(trim(query)) AS q, count(*) AS freq
  FROM search_log
  WHERE ts > now() - interval '30 days' AND result_count > 0
  GROUP BY 1 HAVING count(*) >= 5
`);
const bulk = rollup.rows.flatMap(({ q, freq }) => [
  { index: { _index: "suggestions" } },
  { suggest: { input: q, weight: Math.min(freq, 1000) } },  // cap weight so one viral query can't dominate
]);
await es.bulk({ refresh: true, operations: bulk });

Verify: the most-searched prefix returns the highest-frequency query first.

curl -s 'localhost:9200/suggestions/_search' -H 'Content-Type: application/json' \
  -d '{"suggest":{"s":{"prefix":"ip","completion":{"field":"suggest","size":5}}}}' \
  | jq '.suggest.s[0].options[] | {text, score: ._source}'

6. Personalize with contexts

Attach a context (category, locale, or user segment) to bias or filter completions per user without maintaining separate indices.

"suggest": {
  "type": "completion",
  "contexts": [{ "name": "locale", "type": "category", "path": "locale" }]
}

A query then passes contexts: { locale: [{ context: "en-GB", boost: 2 }] } to prefer locale-matched completions. Verify: the same prefix returns different top suggestions for en-GB vs en-US.

Configuration Reference

Name	Default	Type	Effect
`min_gram`	1	integer	Shortest prefix the edge-ngram filter emits; raise to 2–3 to cut index size and noise on single-letter input.
`max_gram`	2	integer	Longest prefix emitted; set to your longest expected term (e.g. 20) or longer inputs silently stop matching.
`search_analyzer`	(index analyzer)	string	Analyzer applied to query text; must be a plain analyzer so user input is not itself edge-ngrammed.
`max_input_length`	50	integer	Per-input character cap for `completion` fields; longer inputs are truncated before FST insertion.
`preserve_separators`	true	boolean	Whether whitespace/punctuation is significant in the FST; false lets `nyc` match `new york city`.
`skip_duplicates`	false	boolean	Suppresses identical completion strings in one lookup; raises per-shard work, so leave off for tiny corpora.
`fuzzy.fuzziness`	0	string/int	Max edit distance for typo tolerance (`AUTO`, 1, 2); each level multiplies FST traversal cost.
`fuzzy.prefix_length`	0	integer	Leading characters that must match exactly before fuzziness applies; 1–2 protects latency on short input.

Failure Modes & Debugging

Symptom: every keystroke returns thousands of results and latency spikes

Root cause: the same edge-ngram analyzer is applied at both index and search time, so the user’s sea is expanded into s, se, sea and the s token matches nearly every document. Remediation — confirm the mapping declares a separate search_analyzer:

curl -s 'localhost:9200/products/_mapping/field/title' | jq '.products.mappings.title.mapping.title'
# search_analyzer must differ from analyzer; if absent, reindex with autocomplete_search set

Symptom: completion suggester returns nothing for a prefix you know was indexed

Root cause: the FST is rebuilt on refresh, and the documents were indexed without ?refresh (or before the refresh interval elapsed), so the new inputs are not yet in any loaded FST segment. Remediation:

curl -s -X POST 'localhost:9200/suggestions/_refresh'
curl -s 'localhost:9200/suggestions/_search' \
  -d '{"suggest":{"s":{"prefix":"was","completion":{"field":"suggest"}}}}' \
  -H 'Content-Type: application/json' | jq '.suggest.s[0].options | length'

Symptom: node heap pressure and circuit-breaker trips after adding completion fields

Root cause: the completion FST is held in heap per shard and scales with the number of unique inputs; a multi-million-row suggestion corpus can consume gigabytes. Remediation — measure FST memory and shrink the corpus by thresholding query-log frequency:

curl -s 'localhost:9200/suggestions/_stats/completion?fields=suggest' \
  | jq '.indices.suggestions.total.completion'
# If size_in_bytes is large, raise the HAVING count(*) threshold in the rollup query

Symptom: duplicate rows fill the dropdown

Root cause: document-backed mechanisms (edge-ngram, search_as_you_type) return one hit per matching document, so catalog duplicates surface as repeated suggestions. Remediation — collapse on a keyword field, or for the completion suggester enable skip_duplicates: true:

# document-backed: collapse on exact title
# completion suggester: add "skip_duplicates": true to the completion block

Performance & Scale Notes

Benchmark against your real corpus with _search took and per-shard profile: true, warming caches first with ~1000 representative prefixes drawn from the query log rather than synthetic strings — the latency distribution of real prefixes (heavy on common two- and three-character stems) differs sharply from a uniform sample. Track p50 and p99: autocomplete fires on every keystroke, so a 5ms p50 with a 90ms p99 still produces a visibly janky dropdown for a meaningful slice of inputs.

Completion suggester answers prefix lookups in roughly 1–5ms p99 on a warm FST because it is an in-memory automaton walk, independent of corpus document count — but its heap footprint scales with unique input count. Budget on the order of tens to a few hundred bytes per unique input; a 1M-input corpus is comfortable, a 50M-input one is not.
Edge-ngrams add roughly (max_gram − min_gram + 1) tokens per term, inflating the autocomplete field’s posting lists severalfold; expect the field to occupy 3–6× a plain text field. Query latency tracks ordinary match (single-digit ms with filter caching) but the index is larger and slower to merge.
search_as_you_type generates 3–4 subfields, so it roughly triples the field’s storage versus a single text field, while keeping query latency in the same band as a normal bool_prefix multi_match.
Fuzzy completion with fuzziness: 2 and prefix_length: 0 can multiply FST traversal cost by an order of magnitude on dense automata; keep prefix_length ≥ 1 and reserve fuzziness: 2 for inputs of 6+ characters.

For the popularity feedback loop, the reload cadence is itself a tuning decision. Rebuilding the completion index hourly keeps suggestions fresh as trends move but rebuilds FST segments and briefly raises heap; a daily rebuild halves the churn at the cost of staleness. A reasonable default is hourly incremental upserts of changed query strings plus a nightly full rebuild that also reapplies the frequency threshold, so dead phrases age out instead of accumulating. Measure the rebuild’s effect on p99 by running the warm-prefix benchmark immediately after a reload, not just on a quiescent node — the reload is when latency risk is highest.

Rule of thumb: curated, weighted query suggestions → completion suggester; document-title prefixing with filters and scoring → search_as_you_type; term-prefix matching inside long text bodies → edge-ngrams. When in doubt, start with search_as_you_type — it is the cheapest to wire up, degrades gracefully, and keeps the full query DSL — and graduate to the completion suggester only once query-log mining gives you a curated, weighted list worth holding in heap.

Prefix autocomplete with edge-ngrams — the analyzer-level recipe for the edge-ngram mechanism described here.
Search-as-you-type interfaces — the frontend debouncing and cancellation layer that drives these suggestion queries.
Synonym & stopword management — synonyms and stopwords change which prefixes resolve, so they interact with every suggester.
Elasticsearch fundamentals for engineers — analyzer chains, the inverted index, and refresh mechanics underpinning all three approaches.
Query-time boosting strategies — how popularity and recency weights flow into suggestion ranking.