Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Storage

Laurus uses a pluggable storage layer that abstracts how and where index data is persisted. All components — lexical index, vector index, and document log — share a single storage backend.

The Storage Trait

All backends implement the Storage trait:

#![allow(unused)]
fn main() {
pub trait Storage: Send + Sync + Debug {
    fn loading_mode(&self) -> LoadingMode;
    fn open_input(&self, name: &str) -> Result<Box<dyn StorageInput>>;
    fn create_output(&self, name: &str) -> Result<Box<dyn StorageOutput>>;
    fn file_exists(&self, name: &str) -> bool;
    fn delete_file(&self, name: &str) -> Result<()>;
    fn list_files(&self) -> Result<Vec<String>>;
    fn file_size(&self, name: &str) -> Result<u64>;
    // ... additional methods
}
}

This interface is file-oriented: all data (index segments, metadata, WAL entries, documents) is stored as named files accessed through streaming StorageInput / StorageOutput handles.

Storage Backends

MemoryStorage

All data lives in memory. Fast and simple, but not durable.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use laurus::Storage;
use laurus::storage::memory::MemoryStorage;

let storage: Arc<dyn Storage> = Arc::new(
    MemoryStorage::new(Default::default())
);
}
PropertyValue
DurabilityNone (data lost on process exit)
SpeedFastest
Use caseTesting, prototyping, ephemeral data

FileStorage

Standard file-system based persistence. Each key maps to a file on disk.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use laurus::Storage;
use laurus::storage::file::{FileStorage, FileStorageConfig};

let config = FileStorageConfig::new("/tmp/laurus-data");
let storage: Arc<dyn Storage> = Arc::new(FileStorage::new("/tmp/laurus-data", config)?);
}
PropertyValue
DurabilityFull (persisted to disk)
SpeedModerate (disk I/O)
Use caseGeneral production use

FileStorage with Memory Mapping

FileStorage supports memory-mapped file access via the use_mmap configuration flag. When enabled, the OS manages paging between memory and disk.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use laurus::Storage;
use laurus::storage::file::{FileStorage, FileStorageConfig};

let mut config = FileStorageConfig::new("/tmp/laurus-data");
config.use_mmap = true;  // enable memory-mapped I/O
let storage: Arc<dyn Storage> = Arc::new(FileStorage::new("/tmp/laurus-data", config)?);
}
PropertyValue
DurabilityFull (persisted to disk)
SpeedFast (OS-managed memory mapping)
Use caseLarge datasets, read-heavy workloads

StorageFactory

You can also create storage via configuration:

#![allow(unused)]
fn main() {
use laurus::storage::{StorageConfig, StorageFactory};
use laurus::storage::memory::MemoryStorageConfig;

let storage = StorageFactory::create(
    StorageConfig::Memory(MemoryStorageConfig::default())
)?;
}

PrefixedStorage

The engine uses PrefixedStorage to isolate components within a single storage backend:

graph TB
    E["Engine"]
    E --> P1["PrefixedStorage\nprefix = 'lexical/'"]
    E --> P2["PrefixedStorage\nprefix = 'vector/'"]
    E --> P3["PrefixedStorage\nprefix = 'documents/'"]
    P1 --> S["Storage Backend"]
    P2 --> S
    P3 --> S

When the lexical store writes a key segments/seg-001.dict, it is actually stored as lexical/segments/seg-001.dict in the underlying backend. This ensures no key collisions between components.

You do not need to create PrefixedStorage yourself — the EngineBuilder handles this automatically.

Choosing a Backend

FactorMemoryStorageFileStorageFileStorage (mmap)
DurabilityNoneFullFull
Read speedFastestModerateFast
Write speedFastestModerateModerate
Memory usageProportional to data sizeLowOS-managed
Max data sizeLimited by RAMLimited by diskLimited by disk + address space
Best forTests, small datasetsGeneral useLarge read-heavy datasets

Recommendations

  • Development / Testing: Use MemoryStorage for fast iteration without file cleanup
  • Production (general): Use FileStorage for reliable persistence
  • Production (large scale): Use FileStorage with use_mmap = true when you have large indexes and want to leverage OS page cache

Next Steps