| ← Back to Index | Configuration | Performance |
How Vector Caching Works
Overview
The melder uses a sentence-transformer model (default:
all-MiniLM-L6-v2) to convert each record’s text fields into dense
numeric vectors — fingerprints that capture meaning rather than
characters. Two records about the same entity produce vectors that point
in nearly the same direction, even if the wording differs completely.
This is how method: embedding scoring works.
Encoding is expensive: running 10,000 records through the ONNX model takes around 8 seconds. To avoid repeating this work, the melder caches the encoded vectors to disk after the first run.
The combined embedding index
Rather than storing a separate vector index for every embedding field, the melder builds a single combined index per side. For each record, the vectors for all embedding fields are scaled by the square root of their weights and concatenated into one long vector. This combined vector has a useful property: searching for the nearest combined vectors is exactly equivalent to finding the records with the highest weighted cosine similarity across all embedding fields — the same ranking that full scoring would produce. A single nearest-neighbour search therefore retrieves the right candidates without needing multiple per-field lookups.
The cache filename encodes a hash of the field names, their order, and
their weights. Changing any of these produces a different filename, so
the old cache is ignored and a fresh one is built automatically.
meld cache clear (without --all) uses this same hash to identify
and delete only the now-unreachable stale files.
Config options
embeddings:
model: all-MiniLM-L6-v2
a_cache_dir: cache # required
b_cache_dir: cache # optional — omit to skip B-side caching
a_cache_dir is required — the A-side combined index is always cached
to disk. On first run the directory is created and populated; on
subsequent runs the index loads in milliseconds (~170ms for 100k
records).
b_cache_dir is optional. When set, the B-side combined index is also
saved to disk. When omitted, B vectors are re-encoded from scratch on
every run.
Batch mode lifecycle
First run:
A index: check manifest → missing → cold build → save index + manifest + texthash
B index: same
Subsequent runs (same data, same config):
A index: manifest fresh → load index + texthash → diff: 0 changed → return (ms)
B index: same
Subsequent runs (some records changed):
A index: manifest fresh → load → diff: N records changed → re-encode N → save
B index: same
Config changed (model / spec / blocking):
A index: manifest mismatch → log reason → cold build
B index: same
Scoring (per B record):
1. Look up B record's combined vector in the B index
2. Search the A combined index for the top_n nearest neighbours (O(log N)
with usearch, O(N) with flat)
3. Score each candidate across all match fields
4. Classify and output
Setting b_cache_dir for threshold tuning
Setting b_cache_dir is especially valuable when tuning thresholds or
weights — encoding is done once and the score distribution can be
explored cheaply on subsequent runs.
Live mode lifecycle
Startup:
A index: load from a_cache_dir (or encode + build if stale)
B index: load from b_cache_dir (or encode; only saved if configured)
Both indices live in memory for fast concurrent access.
During operation (upsert / try-match):
Encode the record's embedding fields into a combined vector.
Upsert the combined vector into the in-memory index.
Search the opposite side's index for top_n nearest neighbours.
Shutdown:
Save combined indices to their cache directories.
Staleness and invalidation
Cache validation is multi-layered and runs automatically on every startup. No manual intervention is needed.
Layer 1 — Config hash (manifest check)
A .manifest sidecar is stored alongside each cache file. It records a
hash of the embedding field spec (field names, order, weights), the
blocking configuration, and the model name. On load, these hashes are
compared against the current config. Any mismatch triggers an immediate
cold rebuild with a clear log message explaining what changed:
Warning: A combined index cache invalidated (blocking config changed), rebuilding from scratch.
Warning: B combined index cache invalidated (embedding model changed), rebuilding from scratch.
Layer 2 — Text-hash deduplication (incremental encoding)
After the manifest check passes, the engine computes a FNV-1a hash of
each record’s source text and compares it against the stored hashes in a
.texthash sidecar. Records whose text has not changed are skipped —
their cached vectors are reused. Only records whose text actually
changed (or that are new) are re-encoded through the ONNX model.
This means recurring batch jobs where most records are stable only re-encode the changed minority. If more than 90% of records change in a single run, a full cold rebuild is triggered instead (more efficient batching outweighs the incremental overhead at that point).
Layer 3 — Spec hash in the filename
If you change a field’s weight, rename a field, or add/remove an
embedding field, the spec hash embedded in the cache filename changes
and the old cache file becomes unreachable. The engine builds a fresh
index automatically; meld cache clear (smart mode) finds and removes
the now-unreachable old file along with its sidecars.
Cache files produced per index
| File | Contents |
|---|---|
*.index |
Flat backend: combined vectors (binary). Usearch: key mapping manifest only. |
*.usearchdb/ |
Usearch backend: HNSW graph files, one per block. |
*.index.manifest |
Config hashes, model name, record count, build timestamp. |
*.index.texthash |
Per-record FNV-1a hashes of source text. |
meld cache status
meld cache status prints the model, spec hash, blocking hash,
record count, and build timestamp from each manifest:
A cache benchmarks/batch/100kx100k_usearch/warm/cache (1 index files, 52.3 MB)
model=all-MiniLM-L6-v2 spec=a3f7c2b1 blocking=deadbeef records=100000 built=2026-03-10T14:22:05Z
meld cache clear
meld cache clear and meld cache clear --all both delete the
sidecars alongside the index files they belong to. See
CLI Reference for full usage.