Managing Synonyms Without Reindexing

The specific failure this guide prevents is the production outage-by-rebuild: a stakeholder asks for one new synonym rule, and answering it triggers a multi-hour reindex of a billion-document index because the synonym filter was wired into the index analyzer. The fix is structural — search-time synonym_graph filters marked updateable, refreshed through the _reload_search_analyzers API — and it lets you ship rule changes in seconds without touching stored documents. This is the operational companion to synonym & stopword management, and it sits within the wider practice of Ranking Algorithms & Relevance Tuning where every recall change must be deployable without downtime.

Prerequisites

  1. Elasticsearch 7.3+ or OpenSearch 2.x (the updateable flag and reload API landed in 7.3).
  2. A synonym filter defined with synonyms_path and updateable: true, used only in a search_analyzer.
  3. The synonyms file deployed to the config/analysis/ directory on every data node.
  4. Cluster monitoring rights to call _reload_search_analyzers and read its per-node response.

Diagnosis / Context

updateable: true is only legal in a search-time analyzer. The reason is mechanical: index-time tokens are written into immutable Lucene segments, so changing them requires rewriting documents — there is nothing to “reload.” Search-time filters run per query, so swapping the rule set behind them takes effect on the next query once the analyzer is reloaded. If you try to mark an index-analyzer filter updateable, index creation fails with a clear error:

{
  "type": "illegal_argument_exception",
  "reason": "Can't apply updateable synonyms to an index time analyzer.
             Use 'analyzer' to set a search time analyzer instead."
}

That message is the diagnostic signal: it means the synonym filter is on the wrong side of the pipeline. The remedy is to move it into the field’s search_analyzer, leaving the analyzer (index-time) clean — exactly the search-time posture argued for in the parent guide, which also keeps your IDF table honest for downstream BM25 Tuning & Weights.

Why does this matter operationally rather than just theoretically? Synonym sets are living artifacts. A search team fields a steady stream of requests — a marketing campaign introduces a new product alias, support notices users searching cancellation when documents say refund, an analyst spots a zero-result spike for a regional spelling. Each of these is a one-line rule change, and each should ship in minutes. If your only deployment path is a reindex, the synonym backlog grows because nobody wants to trigger a multi-hour rebuild for a single line, and relevance quietly degrades between rebuild windows. The updateable + _reload_search_analyzers pattern turns synonym management into a routine config push, decoupling the cadence of rule changes from the cost of touching stored data. The reload is also cheap on the cluster: it rebuilds the in-memory analyzer chain per shard, which is orders of magnitude faster than re-analyzing and rewriting every document. On a large index the difference is seconds versus hours, and the reload does not consume the I/O and merge pressure a reindex imposes, so it is safe to run during peak traffic.

One important constraint: the reload affects the search analyzer only, so it changes how queries are interpreted from the next request onward, but it cannot retroactively change anything already written to disk. That is exactly the property you want for synonyms — and exactly why the same mechanism cannot help you when the change you need is to the indexed tokens themselves.

Solution Steps

1. Confirm the filter is updateable and search-time only

curl -s "localhost:9200/listings/_settings" \
  | jq '.listings.settings.index.analysis.filter.geo_synonyms'
# Expect: { "type": "synonym_graph", "synonyms_path": "...", "updateable": "true" }

2. Edit the synonyms file on every data node

Push the new rules to config/analysis/geo_synonyms.txt on all nodes (config management, a shared mount, or your image build). The reload reads each node’s local copy, so the file must be identical everywhere or nodes will diverge.

# Append a rule, then sync to every node before reloading
echo 'condo, apartment, flat' >> config/analysis/geo_synonyms.txt
# (deploy the file to all data nodes here — Ansible, k8s configmap, etc.)

3. Reload search analyzers

# Picks up the new file contents cluster-wide; no _close/_open, no reindex
curl -s -X POST "localhost:9200/listings/_reload_search_analyzers" | jq .

The response reports which analyzers were reloaded and, critically, the list of nodes that participated — use it to catch a node that missed the file sync. Treat reloaded_node_ids as a deployment assertion: compare its length against your data-node count and fail your deploy pipeline if they disagree, rather than discovering the divergence later through inconsistent search results.

{
  "_shards": { "total": 2, "successful": 2, "failed": 0 },
  "reload_details": [
    {
      "index": "listings",
      "reloaded_analyzers": ["search_analyzer"],
      "reloaded_node_ids": ["node-a", "node-b"]
    }
  ]
}

Verification

Analyze the new term through the search analyzer and confirm expansion, then run a live query:

curl -s "localhost:9200/listings/_analyze" -H 'Content-Type: application/json' -d '{
  "analyzer": "search_analyzer", "text": "condo"
}' | jq '[.tokens[].token]'
# Expected output:
# [ "condo", "apartment", "flat" ]

A query for condo should now return documents indexed with apartment — no reindex occurred, and _reload_search_analyzers took effect on the next request. For a fuller safety net, run the reload and verification as a single CI step against a staging cluster before promoting the synonyms file to production: analyze the changed terms, assert the expected expansion, then run a handful of known-good queries and compare hit counts against a baseline. This catches the two failure classes that matter — a malformed rule that silently parses to nothing, and a node that did not receive the file — before they reach users. Keep the synonyms file itself in version control alongside this test so every rule change is reviewed and reproducible, the same way you would manage any other relevance-affecting configuration.

Common Pitfalls

Pitfall: reload succeeds on some nodes, query results are inconsistent

The file was synced to only a subset of data nodes. _reload_search_analyzers reads each node’s local file, so partial sync produces a split brain where shards on different nodes expand differently. Always deploy the file to every node before reloading, and check reloaded_node_ids in the response against your node count.

Pitfall: index-time synonyms can never be reloaded

If the synonym filter sits in the index analyzer, no reload exists — the tokens are baked into segments. You must reindex to change them. Moving the filter to the search_analyzer is the one-time structural fix; after that, all future rule changes are live. See synonym & stopword management for the analyzer split.

Pitfall: changing a token-altering rule still needs a reindex

Reload only swaps query-time expansion. If you change something that alters how documents were indexed — a tokenizer change, a stemming filter, a new character filter, or moving synonyms back to index time for storage reasons — existing documents reflect the old analysis until rebuilt. The clean pattern is to reindex into a new index behind an alias and atomically swap the alias on completion, so reads never see a half-built state. Reserve this for genuine index-time changes; for the synonym use case it should be rare, which is the whole point of keeping expansion at search time.