Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Hybrid Search

Hybrid search combines lexical search (keyword matching) with vector search (semantic similarity) to deliver results that are both precise and semantically relevant. This is Laurus’s most powerful search mode.

Search TypeStrengthsWeaknesses
Lexical onlyExact keyword matching, handles rare terms wellMisses synonyms and paraphrases
Vector onlyUnderstands meaning, handles synonymsMay miss exact keywords, less precise
HybridBest of both worldsSlightly more complex to configure

How It Works

sequenceDiagram
    participant User
    participant Engine
    participant Lexical as LexicalStore
    participant Vector as VectorStore
    participant Fusion

    User->>Engine: SearchRequest\n(lexical + vector)

    par Execute in parallel
        Engine->>Lexical: BM25 keyword search
        Lexical-->>Engine: Ranked hits (by relevance)
    and
        Engine->>Vector: ANN similarity search
        Vector-->>Engine: Ranked hits (by distance)
    end

    Engine->>Fusion: Merge two result sets
    Note over Fusion: RRF or WeightedSum
    Fusion-->>Engine: Unified ranked list
    Engine-->>User: Vec of SearchResult

Basic Usage

Builder API

#![allow(unused)]
fn main() {
use laurus::{SearchRequestBuilder, LexicalSearchRequest, FusionAlgorithm};
use laurus::lexical::TermQuery;
use laurus::vector::VectorSearchRequestBuilder;

let request = SearchRequestBuilder::new()
    // Lexical component
    .lexical_search_request(
        LexicalSearchRequest::new(
            Box::new(TermQuery::new("body", "rust"))
        )
    )
    // Vector component
    .vector_search_request(
        VectorSearchRequestBuilder::new()
            .add_text("text_vec", "systems programming")
            .build()
    )
    // Fusion algorithm
    .fusion_algorithm(FusionAlgorithm::RRF { k: 60.0 })
    .limit(10)
    .build();

let results = engine.search(request).await?;
}

Query DSL

Mix lexical and vector clauses in a single query string:

#![allow(unused)]
fn main() {
use laurus::UnifiedQueryParser;
use laurus::lexical::QueryParser;
use laurus::vector::VectorQueryParser;

let unified = UnifiedQueryParser::new(
    QueryParser::new(analyzer).with_default_field("body"),
    VectorQueryParser::new(embedder),
);

// Lexical + vector in one query
let request = unified.parse(r#"body:rust text_vec:~"systems programming""#).await?;
let results = engine.search(request).await?;
}

The ~"..." syntax identifies vector clauses. Everything else is parsed as lexical.

Fusion Algorithms

When both lexical and vector results exist, they must be merged into a single ranked list. Laurus supports two fusion algorithms:

RRF (Reciprocal Rank Fusion)

The default algorithm. Combines results based on their rank positions rather than raw scores.

score(doc) = sum( 1 / (k + rank_i) )

Where rank_i is the position of the document in each result list, and k is a smoothing parameter (default 60).

#![allow(unused)]
fn main() {
use laurus::FusionAlgorithm;

let fusion = FusionAlgorithm::RRF { k: 60.0 };
}

Advantages:

  • Robust to different score distributions between lexical and vector results
  • No need to tune weights
  • Works well out of the box

WeightedSum

Linearly combines normalized lexical and vector scores:

score(doc) = lexical_weight * lexical_score + vector_weight * vector_score
#![allow(unused)]
fn main() {
use laurus::FusionAlgorithm;

let fusion = FusionAlgorithm::WeightedSum {
    lexical_weight: 0.3,
    vector_weight: 0.7,
};
}

When to use:

  • When you want explicit control over the balance between lexical and vector relevance
  • When you know one signal is more important than the other

SearchRequest Options

OptionDefaultDescription
lexical_search_requestNoneLexical query component
vector_search_requestNoneVector query component
filter_queryNonePre-filter using a lexical query (restricts both lexical and vector results)
fusion_algorithmNone (uses RRF { k: 60.0 } when both results exist)How to merge lexical and vector results
limit10Maximum number of results to return
offset0Number of results to skip (for pagination)

SearchResult

Each result contains:

FieldTypeDescription
idStringExternal document ID
scoref32Fused relevance score
documentOption<Document>Full document content (if loaded)

Apply a filter to restrict both lexical and vector results:

#![allow(unused)]
fn main() {
let request = SearchRequestBuilder::new()
    .lexical_search_request(
        LexicalSearchRequest::new(Box::new(TermQuery::new("body", "rust")))
    )
    .vector_search_request(
        VectorSearchRequestBuilder::new()
            .add_text("text_vec", "systems programming")
            .build()
    )
    // Only search within "tutorial" category
    .filter_query(Box::new(TermQuery::new("category", "tutorial")))
    .fusion_algorithm(FusionAlgorithm::RRF { k: 60.0 })
    .limit(10)
    .build();
}

How Filtering Works

  1. The filter query runs on the lexical index to produce a set of allowed document IDs
  2. For lexical search: the filter is combined with the user query as a boolean AND
  3. For vector search: the allowed IDs are passed to restrict the ANN search

Pagination

Use offset and limit for pagination:

#![allow(unused)]
fn main() {
// Page 1: results 0-9
let page1 = SearchRequestBuilder::new()
    .lexical_search_request(/* ... */)
    .vector_search_request(/* ... */)
    .offset(0)
    .limit(10)
    .build();

// Page 2: results 10-19
let page2 = SearchRequestBuilder::new()
    .lexical_search_request(/* ... */)
    .vector_search_request(/* ... */)
    .offset(10)
    .limit(10)
    .build();
}

Complete Example

use std::sync::Arc;
use laurus::{
    Document, Engine, Schema, SearchRequestBuilder,
    LexicalSearchRequest, FusionAlgorithm, PerFieldEmbedder,
};
use laurus::lexical::{TextOption, TermQuery};
use laurus::lexical::core::field::IntegerOption;
use laurus::vector::{HnswOption, VectorSearchRequestBuilder};
use laurus::storage::memory::MemoryStorage;

#[tokio::main]
async fn main() -> laurus::Result<()> {
    let storage = Arc::new(MemoryStorage::new(Default::default()));

    // Schema with both lexical and vector fields
    let schema = Schema::builder()
        .add_text_field("title", TextOption::default())
        .add_text_field("body", TextOption::default())
        .add_text_field("category", TextOption::default())
        .add_integer_field("year", IntegerOption::default())
        .add_hnsw_field("body_vec", HnswOption {
            dimension: 384,
            ..Default::default()
        })
        .build();

    // Configure analyzer and embedder (see Text Analysis and Embeddings docs)
    // let analyzer = Arc::new(StandardAnalyzer::new()?);
    // let embedder = Arc::new(CandleBertEmbedder::new("sentence-transformers/all-MiniLM-L6-v2")?);
    let engine = Engine::builder(storage, schema)
        // .analyzer(analyzer)
        // .embedder(embedder)
        .build()
        .await?;

    // Index documents with both text and vector fields
    engine.add_document("doc-1", Document::builder()
        .add_text("title", "Rust Programming Guide")
        .add_text("body", "Rust is a systems programming language.")
        .add_text("category", "programming")
        .add_integer("year", 2024)
        .add_text("body_vec", "Rust is a systems programming language.")
        .build()
    ).await?;
    engine.commit().await?;

    // Hybrid search: keyword "rust" + semantic "systems language"
    let results = engine.search(
        SearchRequestBuilder::new()
            .lexical_search_request(
                LexicalSearchRequest::new(Box::new(TermQuery::new("body", "rust")))
            )
            .vector_search_request(
                VectorSearchRequestBuilder::new()
                    .add_text("body_vec", "systems language")
                    .build()
            )
            .fusion_algorithm(FusionAlgorithm::RRF { k: 60.0 })
            .limit(10)
            .build()
    ).await?;

    for r in &results {
        println!("{}: score={:.4}", r.id, r.score);
    }

    Ok(())
}

Next Steps