Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmarking

This guide describes how to run laurus benchmarks, how to capture and compare baselines, and how to report results in pull requests.

The benchmark suite lives in laurus/benches/ and is built on Criterion. Hygiene rules — deterministic seeds, file-level documentation, sanity asserts, and sample_size policy — are codified in laurus/benches/common.rs.

Suite overview

FileScope
bkd_bench.rsBKD tree range search, intersect, and build (1D / 2D / 3D, 10k / 100k / 1M points)
distance_bench.rsDistanceMetric::distance for cosine, Euclidean, Manhattan, dot product (single dimension today; sweep tracked in #424)
lexical_search_bench.rsEnd-to-end lexical search through Engine::search for term, boolean, phrase, fuzzy, and DSL queries
search_perf.rsPosting iterator skip_to, BM25Scorer::score, SIMD batch scoring, compact posting conversion
spell_correction_bench.rsSpellingCorrector::correct with a fresh corrector per iteration (cold-state measurement)
synonym_bench.rsSynonymDictionary::get_synonyms lookup at 100 / 1k / 10k groups, plus build cost
text_analysis_bench.rsStandardAnalyzer::analyze single-document and batch (100 docs)
vector_search_bench.rsFlat / IVF / HNSW construction and search at 1k / 5k vectors, dim 128, top-10

Each file declares its scope, scenarios, and how to filter in its top-of-file //! doc comment. Read it before running.

Running benchmarks

Run a single bench file:

cargo bench -p laurus --bench distance_bench

Filter by criterion id (substring match):

cargo bench -p laurus --bench distance_bench -- cosine
cargo bench -p laurus --bench vector_search_bench -- "HNSW Search/top10"

Compile-only smoke check (useful in CI or during refactors):

cargo bench -p laurus --bench distance_bench --no-run

Run every bench file in the workspace:

cargo bench -p laurus

Saving and comparing baselines

Criterion supports named baselines so you can compare a feature branch against main (or any other reference run).

Save a baseline named main from your current state:

cargo bench -p laurus --bench distance_bench -- --save-baseline main

Compare a subsequent run against that baseline:

cargo bench -p laurus --bench distance_bench -- --baseline main

The output prints a change: line per benchmark with a percentage and a verdict (No change in performance detected, Performance has improved, Performance has regressed). Criterion stores baselines under target/criterion/<bench-id>/<baseline>/.

Recommended flow for a perf PR:

  1. On main (or before any change) — cargo bench --bench RELEVANT -- --save-baseline main.
  2. Make the change on a branch.
  3. On the branch — cargo bench --bench RELEVANT -- --baseline main.
  4. Copy the change: lines into the PR description.

Microbenchmarks at the µs / ns scale are sensitive to system noise. For meaningful numbers:

  • CPU governor: set to performance (Linux):

    sudo cpupower frequency-set -g performance
    
  • Turbo boost: disable so frequency scaling does not skew results:

    echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo   # Intel
    

    AMD systems and BIOS-level switches differ; consult vendor docs.

  • Background load: close browsers, IDEs, build watchers, and Docker. Anything sharing the CPU skews short-running benches.

  • Pinning (optional): pin to a fixed core if available:

    taskset -c 2 cargo bench -p laurus --bench distance_bench
    
  • Repeat: re-run twice and compare. Differences below ~5 % on a tuned machine are noise; differences above that on a shared workstation may also be noise. Do not over-interpret a single run.

If you cannot stabilise the environment, say so explicitly in the PR (e.g. “measured on a shared laptop, expect ~10 % noise”) rather than presenting unstable numbers as authoritative.

Make targets

The Makefile exposes the common entry points:

make bench             # cargo bench -p laurus
make bench-baseline    # cargo bench -p laurus -- --save-baseline main
make bench-compare     # cargo bench -p laurus -- --baseline main

For a single bench, pass BENCH=name:

make bench BENCH=distance_bench
make bench-baseline BENCH=distance_bench
make bench-compare BENCH=distance_bench

PR description template

When a PR claims a measurable performance change, paste a table like the following into the description:

## Performance

Environment: <CPU model>, governor=performance, turbo disabled, dedicated machine.

Baseline: `main` at <commit-sha>. After: this branch at <commit-sha>.

| Bench | Before | After | Δ | Verdict |
| --- | --- | --- | --- | --- |
| `distance_metrics/cosine` | 4.20 µs | 3.10 µs | -26 % | improved |
| `distance_metrics/euclidean` | 2.18 µs | 2.16 µs | -1 % | no change |

Reproduce: `cargo bench -p laurus --bench distance_bench -- --baseline main`

Always include the commit SHAs of the baseline and the after-state so the comparison is reproducible. State the environment explicitly even when running on a tuned machine.

Adding a new benchmark

When adding a new bench file, follow the suite-wide hygiene rules from laurus/benches/common.rs:

  1. Use a deterministic seed via common::DEFAULT_SEED (or the lcg_* helpers). Never call rand::rng().
  2. Add a top-of-file //! doc comment listing scope, scenarios, run command, and filter examples.
  3. Add a one-time sanity assert! outside the timed b.iter block so a regression that produces empty output cannot pass silently.
  4. Pick SAMPLE_SIZE_FAST (default, for sub-50 ms operations) or SAMPLE_SIZE_SLOW (construction paths). Do not invent intermediate values.
  5. Register the file in laurus/Cargo.toml with [[bench]] name = "..." harness = false. The crate sets autobenches = false, so files in benches/ are not picked up automatically.

If your bench needs to share helpers across files, extend benches/common.rs rather than duplicating code.

Continuous integration

CI does not currently run a regression-detection bench job. Each perf-changing PR is expected to post baseline-vs-after numbers manually, captured under the recommended environment.

A future iteration may add a smoke-set bench job that fails on large regressions; this is tracked under the umbrella issue #429.