Deletions & Compaction
Laurus uses a two-phase deletion strategy: fast logical deletion followed by periodic physical compaction.
Deleting Documents
#![allow(unused)]
fn main() {
// Delete a document by its external ID
engine.delete_documents("doc-1").await?;
engine.commit().await?;
}
Logical Deletion
When a document is deleted, it is not immediately removed from the index files. Instead:
graph LR
Del["delete_documents('doc-1')"] --> Bitmap["Add internal ID\nto Deletion Bitmap"]
Bitmap --> Search["Search skips\ndeleted IDs"]
- The document’s internal ID is added to a deletion bitmap
- The bitmap is checked during every search, filtering out deleted documents from results
- The original data remains in the segment files
Why Logical Deletion?
| Benefit | Description |
|---|---|
| Speed | O(1) — flipping a bit is instant |
| Immutable segments | Segment files are never modified in place, simplifying concurrency |
| Safe recovery | If a crash occurs, the deletion bitmap can be reconstructed from the WAL |
Upserts (Update = Delete + Insert)
When you index a document with an existing external ID, Laurus performs an automatic upsert:
- The old document is logically deleted (its ID is added to the deletion bitmap)
- A new document is inserted with a new internal ID
- The external-to-internal ID mapping is updated
#![allow(unused)]
fn main() {
// First insert
engine.put_document("doc-1", doc_v1).await?;
engine.commit().await?;
// Update: old version is logically deleted, new version is inserted
engine.put_document("doc-1", doc_v2).await?;
engine.commit().await?;
}
Physical Compaction
Over time, logically deleted documents accumulate and waste space. Compaction reclaims this space by rewriting segment files without the deleted entries.
graph LR
subgraph "Before Compaction"
S1["Segment 0\ndoc-1 (deleted)\ndoc-2\ndoc-3 (deleted)"]
S2["Segment 1\ndoc-4\ndoc-5"]
end
Compact["Compaction"]
subgraph "After Compaction"
S3["Segment 0\ndoc-2\ndoc-4\ndoc-5"]
end
S1 --> Compact
S2 --> Compact
Compact --> S3
What Compaction Does
- Reads all live (non-deleted) documents from existing segments
- Rebuilds the inverted index and/or vector index without deleted entries
- Writes new, clean segment files
- Removes the old segment files
- Resets the deletion bitmap
Cost and Frequency
| Aspect | Detail |
|---|---|
| CPU cost | High — rebuilds index structures from scratch |
| I/O cost | High — reads all data, writes new segments |
| Blocking | Searches continue during compaction (reads see the old segments until the new ones are ready) |
| Frequency | Run when deleted documents exceed a threshold (e.g., 10-20% of total) |
When to Compact
- Low-write workloads: Compact periodically (e.g., daily or weekly)
- High-write workloads: Compact when the deletion ratio exceeds a threshold
- After bulk updates: Compact after a large batch of upserts
Deletion Bitmap
The deletion bitmap tracks which internal IDs have been deleted:
- Storage: HashSet of deleted document IDs (
AHashSet<u64>) - Lookup: O(1) — hash set lookup
The bitmap is persisted alongside the index segments and is rebuilt from the WAL during recovery.
Next Steps
- How data is persisted: Persistence & WAL
- ID management and internal/external ID mapping: ID Management