Persistence & WAL
Laurus uses a Write-Ahead Log (WAL) to ensure data durability. Every write operation is persisted to the WAL before modifying in-memory structures, guaranteeing that no data is lost even if the process crashes.
Write Path
sequenceDiagram
participant App as Application
participant Engine
participant WAL as DocumentLog (WAL)
participant Mem as In-Memory Buffers
participant Disk as Storage (segments)
App->>Engine: add_document() / delete_documents()
Engine->>WAL: 1. Append operation to WAL
Engine->>Mem: 2. Update in-memory buffers
Note over Mem: Document is buffered but\nNOT yet searchable
App->>Engine: commit()
Engine->>Disk: 3. Flush segments to storage
Engine->>WAL: 4. Truncate WAL
Note over Disk: Documents are now\nsearchable and durable
Key Principles
- WAL-first: Every write (add or delete) is appended to the WAL before updating in-memory structures
- Buffered writes: In-memory buffers accumulate changes until
commit()is called - Atomic commit:
commit()flushes all buffered changes to segment files and truncates the WAL - Crash safety: If the process crashes between writes and commit, the WAL is replayed on the next startup
Write-Ahead Log (WAL)
The WAL is managed by the DocumentLog component and stored at the root level of the storage backend (engine.wal).
WAL Entry Types
| Entry Type | Description |
|---|---|
| Upsert | Document content + external ID + assigned internal ID |
| Delete | External ID of the document to remove |
WAL File
The WAL file (engine.wal) is an append-only binary log. Each entry is self-contained with:
- Operation type (add/delete)
- Sequence number
- Payload (document data or ID)
Recovery
When an engine is built (Engine::builder(...).build().await), it automatically checks for remaining WAL entries and replays them (the WAL is truncated on commit, so any remaining entries are from a crashed session):
graph TD
Start["Engine::build()"] --> Check["Check WAL for\nuncommitted entries"]
Check -->|"Entries found"| Replay["Replay operations\ninto in-memory buffers"]
Replay --> Ready["Engine ready"]
Check -->|"No entries"| Ready
Recovery is transparent — you do not need to handle it manually.
The Commit Lifecycle
#![allow(unused)]
fn main() {
// 1. Add documents (buffered, not yet searchable)
engine.add_document("doc-1", doc1).await?;
engine.add_document("doc-2", doc2).await?;
// 2. Commit — flush to persistent storage
engine.commit().await?;
// Documents are now searchable
// 3. Add more documents
engine.add_document("doc-3", doc3).await?;
// 4. If the process crashes here, doc-3 is in the WAL
// and will be recovered on next startup
}
When to Commit
| Strategy | Description | Use Case |
|---|---|---|
| After each document | Maximum durability, minimum search latency | Real-time search with few writes |
| After a batch | Good balance of throughput and latency | Bulk indexing |
| Periodically | Maximum write throughput | High-volume ingestion |
Tip: Commits are relatively expensive because they flush segments to storage. For bulk indexing, batch many documents before calling
commit().
Storage Layout
The engine uses PrefixedStorage to organize data:
<storage root>/
├── lexical/ # Inverted index segments
│ ├── seg-000/
│ │ ├── terms.dict
│ │ ├── postings.post
│ │ └── ...
│ └── metadata.json
├── vector/ # Vector index segments
│ ├── seg-000/
│ │ ├── graph.hnsw
│ │ ├── vectors.vecs
│ │ └── ...
│ └── metadata.json
├── documents/ # Document storage
│ └── ...
└── engine.wal # Write-ahead log
Next Steps
- How deletions are handled: Deletions & Compaction
- Storage backends: Storage