| ← Back to Index | Configuration | API Reference |
CLI Reference
All commands accept --log-format json for structured log output.
Logs go to stderr; command output goes to stdout.
meld validate
Parse and validate a config file without loading data or running anything. Catches missing fields, invalid method names, bad threshold values, and malformed blocking rules. Use this to check a config before committing to a long batch run.
meld validate --config config.yaml
| Flag | Short | Description |
|---|---|---|
--config |
-c |
Path to YAML config file (required) |
meld run
Run batch matching: load both datasets, score every B record against the A-side pool, and write three output csvs (results, review, unmatched). The cross-map is updated with auto-matched pairs so that re-running skips already-resolved records.
meld run --config config.yaml
meld run --config config.yaml --dry-run
meld run --config config.yaml --limit 500 --verbose
| Flag | Short | Description |
|---|---|---|
--config |
-c |
Path to YAML config file (required) |
--dry-run |
Validate config, load data, print what would be processed, then exit. No matching or output files. | |
--limit |
Process only the first N B records. Useful for quick sanity checks on large datasets. | |
--verbose |
-v |
Print job metadata, dataset paths, and threshold values at startup. |
meld serve
Start the live-mode HTTP server. Datasets are loaded (into memory or
SQLite, depending on whether live.db_path is set), embedding and
blocking indices are built, and the write-ahead log is replayed for
crash recovery. Once ready, the server accepts requests on the
configured port. See API Reference for endpoint
details.
meld serve --config config.yaml --port 8090
| Flag | Short | Description |
|---|---|---|
--config |
-c |
Path to YAML config file (required) |
--port |
-p |
TCP port to listen on (default: 8080) |
meld enroll
Start the enroll-mode HTTP server for single-pool entity resolution.
Records are enrolled into one growing pool and scored against everything
already there. Designed for graph-based ER workflows and deduplication.
Uses a simplified config format with field: instead of
field_a:/field_b:. See Enroll Mode for full
details.
meld enroll --config enroll_config.yaml --port 8090
| Flag | Short | Description |
|---|---|---|
--config |
-c |
Path to enroll-mode YAML config file (required) |
--port |
-p |
TCP port to listen on (default: 8080) |
meld tune
Run the full batch pipeline without writing any output files, then print a diagnostic report: score distribution histogram, per-field statistics (min/max/mean/median/stddev), threshold analysis showing how the current thresholds split your records, and suggested threshold values based on percentiles.
meld tune --config config.yaml
meld tune --config config.yaml --verbose
| Flag | Short | Description |
|---|---|---|
--config |
-c |
Path to YAML config file (required) |
--verbose |
-v |
Show current threshold values at startup. |
See Accuracy & Tuning for a detailed guide on interpreting the tune output, a worked example with the benchmark dataset, and the recommended weight-tuning workflow.
meld cache build
Pre-build embedding index caches for one or both sides. Encodes all
records through the ONNX model and writes the resulting vectors to disk
so that subsequent meld run or meld serve invocations start
instantly instead of re-encoding. This is especially useful when the
same dataset is matched repeatedly with different configs or thresholds.
meld cache build --config config.yaml
| Flag | Short | Description |
|---|---|---|
--config |
-c |
Path to YAML config file (required) |
meld cache status
Show the status of each cache file: whether it exists, its size on disk, and the number of records it contains (for index files).
meld cache status --config config.yaml
meld cache clear
Delete stale cache files. The default behaviour is smart: it computes the cache filename that the current config expects (derived from a hash of the embedding field names, order, and weights) and deletes only files that do not match — i.e. files left over from a previous config that are now unreachable. The current valid cache is left untouched.
Use --all to delete everything regardless.
# Smart clear: delete stale files only (safe to run before any rebuild)
meld cache clear --config config.yaml
# Full wipe: delete all cache files including the current valid ones
meld cache clear --config config.yaml --all
| Flag | Description |
|---|---|
--all |
Delete all cache files, including the current valid cache. Forces a cold rebuild on the next run. |
When to use --all: after changing the embedding model, or when
you want to reclaim disk space and are happy to re-encode from scratch.
When the smart default is enough: after changing field weights, adding a new match field, or renaming fields. These all change the spec hash, so the old cache files become unreachable automatically — the smart clear finds and removes them without touching anything current.
meld review list
Print the review queue as a formatted table. The review csv is produced
by meld run and contains borderline pairs that scored between
review_floor and auto_match. This command reads that file and
displays it with aligned columns for easy scanning.
meld review list --config config.yaml
meld review import
Import human decisions on review pairs. The decisions file is a csv with
columns a_id, b_id, and decision (either accept or reject).
Accepted pairs are added to the cross-map. Both accepted and rejected
pairs are removed from the review csv, shrinking the queue.
meld review import --config config.yaml --file decisions.csv
| Flag | Short | Description |
|---|---|---|
--config |
-c |
Path to YAML config file (required) |
--file |
-f |
Path to decisions csv (required) |
meld crossmap stats
Show cross-map statistics: total matched pairs, and coverage as a percentage of both A and B datasets. Loads the datasets to compute totals.
meld crossmap stats --config config.yaml
meld crossmap export
Export the cross-map to a csv file at a specified path. Useful for backing up the current state or transferring matches to another system.
meld crossmap export --config config.yaml --out matches.csv
| Flag | Short | Description |
|---|---|---|
--out |
-o |
Output file path (required) |
meld crossmap import
Import match pairs from a csv file into the cross-map. The csv must have
columns matching the configured a_id_field and b_id_field. Pairs are
merged with any existing cross-map entries — duplicates are ignored.
meld crossmap import --config config.yaml --file pairs.csv
| Flag | Short | Description |
|---|---|---|
--file |
-f |
Input csv file path (required) |