Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Deletions & Compaction

Laurus uses a two-phase deletion strategy: fast logical deletion followed by periodic physical compaction.

Deleting Documents

#![allow(unused)]
fn main() {
// Delete a document by its external ID
engine.delete_documents("doc-1").await?;
engine.commit().await?;
}

Logical Deletion

When a document is deleted, it is not immediately removed from the index files. Instead:

graph LR
    Del["delete_documents('doc-1')"] --> Bitmap["Add internal ID\nto Deletion Bitmap"]
    Bitmap --> Search["Search skips\ndeleted IDs"]
  1. The document’s internal ID is added to a deletion bitmap
  2. The bitmap is checked during every search, filtering out deleted documents from results
  3. The original data remains in the segment files

Why Logical Deletion?

BenefitDescription
SpeedO(1) — flipping a bit is instant
Immutable segmentsSegment files are never modified in place, simplifying concurrency
Safe recoveryIf a crash occurs, the deletion bitmap can be reconstructed from the WAL

Upserts (Update = Delete + Insert)

When you index a document with an existing external ID, Laurus performs an automatic upsert:

  1. The old document is logically deleted (its ID is added to the deletion bitmap)
  2. A new document is inserted with a new internal ID
  3. The external-to-internal ID mapping is updated
#![allow(unused)]
fn main() {
// First insert
engine.put_document("doc-1", doc_v1).await?;
engine.commit().await?;

// Update: old version is logically deleted, new version is inserted
engine.put_document("doc-1", doc_v2).await?;
engine.commit().await?;
}

Physical Compaction

Over time, logically deleted documents accumulate and waste space. Compaction reclaims this space by rewriting segment files without the deleted entries.

graph LR
    subgraph "Before Compaction"
        S1["Segment 0\ndoc-1 (deleted)\ndoc-2\ndoc-3 (deleted)"]
        S2["Segment 1\ndoc-4\ndoc-5"]
    end

    Compact["Compaction"]

    subgraph "After Compaction"
        S3["Segment 0\ndoc-2\ndoc-4\ndoc-5"]
    end

    S1 --> Compact
    S2 --> Compact
    Compact --> S3

What Compaction Does

  1. Reads all live (non-deleted) documents from existing segments
  2. Rebuilds the inverted index and/or vector index without deleted entries
  3. Writes new, clean segment files
  4. Removes the old segment files
  5. Resets the deletion bitmap

Cost and Frequency

AspectDetail
CPU costHigh — rebuilds index structures from scratch
I/O costHigh — reads all data, writes new segments
BlockingSearches continue during compaction (reads see the old segments until the new ones are ready)
FrequencyRun when deleted documents exceed a threshold (e.g., 10-20% of total)

When to Compact

  • Low-write workloads: Compact periodically (e.g., daily or weekly)
  • High-write workloads: Compact when the deletion ratio exceeds a threshold
  • After bulk updates: Compact after a large batch of upserts

Deletion Bitmap

The deletion bitmap tracks which internal IDs have been deleted:

  • Storage: HashSet of deleted document IDs (AHashSet<u64>)
  • Lookup: O(1) — hash set lookup

The bitmap is persisted alongside the index segments and is rebuilt from the WAL during recovery.

Next Steps