Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

API Reference

Index

The primary entry point. Wraps the Laurus search engine.

Laurus::Index.new(path: nil, schema: nil)

Constructor

ParameterTypeDefaultDescription
path:String | nilnilDirectory path for persistent storage. nil creates an in-memory index.
schema:Schema | nilnilSchema definition. An empty schema is used when omitted.

Methods

MethodDescription
put_document(id, doc)Upsert a document. Replaces all existing versions with the same ID.
add_document(id, doc)Append a document chunk without removing existing versions.
get_documents(id) -> Array<Hash>Return all stored versions for the given ID.
delete_documents(id)Delete all versions for the given ID.
commitFlush buffered writes and make all pending changes searchable.
search(query, limit: 10, offset: 0) -> Array<SearchResult>Execute a search query.
stats -> HashReturn index statistics ("document_count", "vector_fields").

search query argument

The query parameter accepts any of the following:

  • A DSL string (e.g. "title:hello", "embedding:\"memory safety\"")
  • A lexical query object (TermQuery, PhraseQuery, BooleanQuery, …)
  • A vector query object (VectorQuery, VectorTextQuery)
  • A SearchRequest for full control

Schema

Defines the fields and index types for an Index.

Laurus::Schema.new

Field methods

MethodDescription
add_text_field(name, stored: true, indexed: true, term_vectors: false, analyzer: nil)Full-text field (inverted index, BM25).
add_integer_field(name, stored: true, indexed: true)64-bit integer field.
add_float_field(name, stored: true, indexed: true)64-bit float field.
add_boolean_field(name, stored: true, indexed: true)Boolean field.
add_bytes_field(name, stored: true)Raw bytes field.
add_geo_field(name, stored: true, indexed: true)Geographic coordinate field (lat/lon).
add_datetime_field(name, stored: true, indexed: true)UTC datetime field.
add_hnsw_field(name, dimension, distance: "cosine", m: 16, ef_construction: 200, embedder: nil)HNSW approximate nearest-neighbor vector field.
add_flat_field(name, dimension, distance: "cosine", embedder: nil)Flat (brute-force) vector field.
add_ivf_field(name, dimension, distance: "cosine", n_clusters: 100, n_probe: 1, embedder: nil)IVF approximate nearest-neighbor vector field.

Other methods

MethodDescription
add_embedder(name, config)Register a named embedder definition. config is a Hash with a "type" key (see below).
set_default_fields(fields)Set the default fields used when no field is specified in a query. fields is an Array of Strings.
field_names -> Array<String>Return the list of field names defined in this schema.

Embedder types

"type"Required keysFeature flag
"precomputed"(always available)
"candle_bert""model"embeddings-candle
"candle_clip""model"embeddings-multimodal
"openai""model"embeddings-openai

Distance metrics

ValueDescription
"cosine"Cosine similarity (default)
"euclidean"Euclidean distance
"dot_product"Dot product
"manhattan"Manhattan distance
"angular"Angular distance

Query classes

TermQuery

Laurus::TermQuery.new(field, term)

Matches documents containing the exact term in the given field.

PhraseQuery

Laurus::PhraseQuery.new(field, terms)

Matches documents containing the terms in order. terms is an Array of Strings.

FuzzyQuery

Laurus::FuzzyQuery.new(field, term, max_edits: 2)

Approximate match allowing up to max_edits edit-distance errors.

WildcardQuery

Laurus::WildcardQuery.new(field, pattern)

Pattern match. * matches any sequence of characters, ? matches any single character.

NumericRangeQuery

Laurus::NumericRangeQuery.new(field, min: nil, max: nil)

Matches numeric values in the range [min, max]. Pass nil for an open bound. The type (integer or float) is inferred from the Ruby type of min/max.

GeoQuery

# Radius search
Laurus::GeoQuery.within_radius(field, lat, lon, distance_km)

# Bounding box search
Laurus::GeoQuery.within_bounding_box(field, min_lat, min_lon, max_lat, max_lon)

within_radius returns documents whose coordinate is within distance_km of the given point. within_bounding_box returns documents within the specified bounding box.

BooleanQuery

bq = Laurus::BooleanQuery.new
bq.must(query)
bq.should(query)
bq.must_not(query)

Compound boolean query. must clauses all have to match; at least one should clause must match; must_not clauses must not match.

SpanQuery

# Single term
Laurus::SpanQuery.term(field, term)

# Near: terms within slop positions
Laurus::SpanQuery.near(field, terms, slop: 0, ordered: true)

# Near with nested SpanQuery clauses
Laurus::SpanQuery.near_spans(field, clauses, slop: 0, ordered: true)

# Containing: big span contains little span
Laurus::SpanQuery.containing(field, big, little)

# Within: include span within exclude span at max distance
Laurus::SpanQuery.within(field, include_span, exclude_span, distance)

Positional / proximity span queries. near takes an Array of term Strings, while near_spans takes an Array of SpanQuery objects for nested expressions.

VectorQuery

Laurus::VectorQuery.new(field, vector)

Approximate nearest-neighbor search using a pre-computed embedding vector. vector is an Array of Floats.

VectorTextQuery

Laurus::VectorTextQuery.new(field, text)

Converts text to an embedding at query time and runs vector search. Requires an embedder configured on the index.


SearchRequest

Full-featured search request for advanced control.

Laurus::SearchRequest.new(
  query: nil,
  lexical_query: nil,
  vector_query: nil,
  filter_query: nil,
  fusion: nil,
  limit: 10,
  offset: 0,
)
ParameterDescription
query:A DSL string or single query object. Mutually exclusive with lexical_query: / vector_query:.
lexical_query:Lexical component for explicit hybrid search.
vector_query:Vector component for explicit hybrid search.
filter_query:Lexical filter applied after scoring.
fusion:Fusion algorithm (RRF or WeightedSum). Defaults to RRF(k: 60) when both components are set.
limit:Maximum number of results (default 10).
offset:Pagination offset (default 0).

SearchResult

Returned by Index#search.

result.id        # => String   -- External document identifier
result.score     # => Float    -- Relevance score
result.document  # => Hash|nil -- Retrieved field values, or nil if deleted

Fusion algorithms

RRF

Laurus::RRF.new(k: 60.0)

Reciprocal Rank Fusion. Merges lexical and vector result lists by rank position. k is a smoothing constant; higher values reduce the influence of top-ranked results.

WeightedSum

Laurus::WeightedSum.new(lexical_weight: 0.5, vector_weight: 0.5)

Normalises both score lists independently, then combines them as lexical_weight * lexical_score + vector_weight * vector_score.


Text analysis

SynonymDictionary

dict = Laurus::SynonymDictionary.new
dict.add_synonym_group(["fast", "quick", "rapid"])

A dictionary of synonym groups. All terms in a group are treated as synonyms of each other.

WhitespaceTokenizer

tokenizer = Laurus::WhitespaceTokenizer.new
tokens = tokenizer.tokenize("hello world")

Splits text on whitespace boundaries and returns an Array of Token objects.

SynonymGraphFilter

filter = Laurus::SynonymGraphFilter.new(dictionary, keep_original: true, boost: 1.0)
expanded = filter.apply(tokens)

Token filter that expands tokens with their synonyms from a SynonymDictionary.

Token

token.text                # => String  -- The token text
token.position            # => Integer -- Position in the token stream
token.start_offset        # => Integer -- Character start offset in the original text
token.end_offset          # => Integer -- Character end offset in the original text
token.boost               # => Float   -- Score boost factor (1.0 = no adjustment)
token.stopped             # => Boolean -- Whether removed by a stop filter
token.position_increment  # => Integer -- Difference from the previous token's position
token.position_length     # => Integer -- Number of positions spanned

Field value types

Ruby values are automatically converted to Laurus DataValue types:

Ruby typeLaurus typeNotes
nilNull
true / falseBool
IntegerInt64
FloatFloat64
StringText
Array of numericsVectorElements coerced to f32
Hash with "lat", "lon"GeoTwo Float values
Time (responds to iso8601)DateTimeConverted via iso8601