Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Storage

Laurus uses a pluggable storage layer that abstracts how and where index data is persisted. All components — lexical index, vector index, and document log — share a single storage backend.

The Storage Trait

All backends implement the Storage trait:

#![allow(unused)]
fn main() {
pub trait Storage: Send + Sync + Debug {
    fn loading_mode(&self) -> LoadingMode;
    fn open_input(&self, name: &str) -> Result<Box<dyn StorageInput>>;
    fn create_output(&self, name: &str) -> Result<Box<dyn StorageOutput>>;
    fn file_exists(&self, name: &str) -> bool;
    fn delete_file(&self, name: &str) -> Result<()>;
    fn list_files(&self) -> Result<Vec<String>>;
    fn file_size(&self, name: &str) -> Result<u64>;
    // ... additional methods
}
}

This interface is file-oriented: all data (index segments, metadata, WAL entries, documents) is stored as named files accessed through streaming StorageInput / StorageOutput handles.

Storage Backends

MemoryStorage

All data lives in memory. Fast and simple, but not durable.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use laurus::Storage;
use laurus::storage::memory::MemoryStorage;

let storage: Arc<dyn Storage> = Arc::new(
    MemoryStorage::new(Default::default())
);
}
PropertyValue
DurabilityNone (data lost on process exit)
SpeedFastest
Use caseTesting, prototyping, ephemeral data

FileStorage

Standard file-system based persistence. Each key maps to a file on disk.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use laurus::Storage;
use laurus::storage::file::{FileStorage, FileStorageConfig};

let config = FileStorageConfig::new("/tmp/laurus-data");
let storage: Arc<dyn Storage> = Arc::new(FileStorage::new("/tmp/laurus-data", config)?);
}
PropertyValue
DurabilityFull (persisted to disk)
SpeedModerate (disk I/O)
Use caseGeneral production use

FileStorage with Memory Mapping

FileStorage supports memory-mapped file access via the use_mmap configuration flag. When enabled, the OS manages paging between memory and disk; the lexical posting decoder (Issue #504) takes a zero-copy path through StorageInput::as_slice, handing PFOR-bit-packed blocks directly to bitpacking::decompress* instead of allocating an intermediate Vec<u8> and copying through Read.

Default is platform-specific:

  • *Unix (Linux / macOS / BSD): true as of Issue #504. Set the LAURUS_NO_MMAP=1 environment variable when constructing the config (via FileStorageConfig::new) to fall back to buffered file I/O for debug sessions or hosts where mmap misbehaves.
  • Windows: false as of Issue #508. Windows holds an exclusive lock on memory-mapped files (ERROR_USER_MAPPED_FILE, os error 1224) which prevents the writer from truncating / deleting a segment file while a reader still holds an mmap. The current segment-file lifecycle is incompatible with that lock. Set LAURUS_USE_MMAP=1 to opt in for read-only / read-mostly workloads where commit frequency is low. Full Windows mmap support is tracked in Issue #508.
#![allow(unused)]
fn main() {
use std::sync::Arc;
use laurus::Storage;
use laurus::storage::file::{FileStorage, FileStorageConfig};

// mmap is on by default on Unix; on Windows it is off unless
// LAURUS_USE_MMAP=1 is set.
let config = FileStorageConfig::new("/tmp/laurus-data");
let storage: Arc<dyn Storage> = Arc::new(FileStorage::new("/tmp/laurus-data", config)?);

// Explicit opt-out without touching the env var (works on any OS).
let mut buffered_config = FileStorageConfig::new("/tmp/laurus-data");
buffered_config.use_mmap = false;

// Explicit opt-in (works on any OS, including Windows).
let mut mmap_config = FileStorageConfig::new("/tmp/laurus-data");
mmap_config.use_mmap = true;
}
PropertyValue
DurabilityFull (persisted to disk)
SpeedFast (OS-managed memory mapping; zero-copy posting decode)
Use caseDefault for any production-scale workload

StorageFactory

You can also create storage via configuration:

#![allow(unused)]
fn main() {
use laurus::storage::{StorageConfig, StorageFactory};
use laurus::storage::memory::MemoryStorageConfig;

let storage = StorageFactory::create(
    StorageConfig::Memory(MemoryStorageConfig::default())
)?;
}

PrefixedStorage

The engine uses PrefixedStorage to isolate components within a single storage backend:

graph TB
    E["Engine"]
    E --> P1["PrefixedStorage\nprefix = 'lexical/'"]
    E --> P2["PrefixedStorage\nprefix = 'vector/'"]
    E --> P3["PrefixedStorage\nprefix = 'documents/'"]
    P1 --> S["Storage Backend"]
    P2 --> S
    P3 --> S

When the lexical store writes a key segments/seg-001.dict, it is actually stored as lexical/segments/seg-001.dict in the underlying backend. This ensures no key collisions between components.

You do not need to create PrefixedStorage yourself — the EngineBuilder handles this automatically.

ColumnStorage

In addition to the primary storage backends, Laurus provides a ColumnStorage layer for fast field-level access. This is used internally for operations like faceting, sorting, and aggregation, where accessing individual field values without deserializing entire documents is important.

ColumnValue

ColumnValue represents a single stored column value:

VariantDescription
String(String)UTF-8 text
I32(i32)32-bit signed integer
I64(i64)64-bit signed integer
U32(u32)32-bit unsigned integer
U64(u64)64-bit unsigned integer
F32(f32)32-bit floating point
F64(f64)64-bit floating point
Bool(bool)Boolean
DateTime(i64)Unix timestamp (seconds)
NullAbsent value

ColumnStorage is managed internally by the Engine – you do not need to interact with it directly.

Choosing a Backend

FactorMemoryStorageFileStorageFileStorage (mmap)
DurabilityNoneFullFull
Read speedFastestModerateFast
Write speedFastestModerateModerate
Memory usageProportional to data sizeLowOS-managed
Max data sizeLimited by RAMLimited by diskLimited by disk + address space
Best forTests, small datasetsGeneral useLarge read-heavy datasets

Recommendations

  • Development / Testing: Use MemoryStorage for fast iteration without file cleanup
  • Production (general): Use FileStorage for reliable persistence
  • Production (large scale): Use FileStorage with use_mmap = true when you have large indexes and want to leverage OS page cache

Next Steps