API Reference

Index

The primary entry point. Wraps the Laurus search engine.

class Index:
    def __init__(self, path: str | None = None, schema: Schema | None = None) -> None: ...

Constructor

Parameter	Type	Default	Description
`path`	`str \| None`	`None`	Directory path for persistent storage. `None` creates an in-memory index.
`schema`	`Schema \| None`	`None`	Schema definition. An empty schema is used when omitted.

Methods

Method	Description
`put_document(id, doc)`	Upsert a document. Replaces all existing versions with the same ID.
`add_document(id, doc)`	Append a document chunk without removing existing versions.
`get_documents(id) -> list[dict]`	Return all stored versions for the given ID.
`delete_documents(id)`	Delete all versions for the given ID.
`commit()`	Flush buffered writes and make all pending changes searchable.
`search(query, *, limit=10, offset=0) -> list[SearchResult]`	Execute a search query.
`stats() -> dict`	Return index statistics (`document_count`, `vector_fields`).

`search` query argument

The query parameter accepts any of the following:

A DSL string (e.g. "title:hello", "embedding:\"memory safety\"")
A lexical query object (TermQuery, PhraseQuery, BooleanQuery, …)
A vector query object (VectorQuery, VectorTextQuery)
A SearchRequest for full control

Schema

Defines the fields and index types for an Index.

class Schema:
    def __init__(self) -> None: ...

Field methods

Method	Description
`add_text_field(name, *, stored=True, indexed=True, term_vectors=False, analyzer=None)`	Full-text field (inverted index, BM25). `analyzer` accepts a built-in name (`"standard"`, `"english"`, `"keyword"`, `"simple"`, `"noop"`, or any custom name registered via `add_analyzer`) or a dict configuring a parameterised preset such as `{"language": "japanese", "mode": "normal", "dict": "/var/lib/lindera/ipadic"}`. The bare string `"japanese"` is rejected because the preset requires a Lindera dictionary path.
`add_integer_field(name, *, stored=True, indexed=True, multi_valued=False)`	64-bit integer field. Set `multi_valued=True` to accept arrays of integers (range queries match if any value satisfies the predicate).
`add_float_field(name, *, stored=True, indexed=True, multi_valued=False)`	64-bit float field. Set `multi_valued=True` to accept arrays of floats (range queries match if any value satisfies the predicate).
`add_boolean_field(name, *, stored=True, indexed=True)`	Boolean field.
`add_bytes_field(name, *, stored=True)`	Raw bytes field.
`add_geo_field(name, *, stored=True, indexed=True)`	Geographic coordinate field (lat/lon).
`add_geo3d_field(name, *, stored=True, indexed=True)`	3D ECEF Cartesian point field (x, y, z in metres). See Geo3d concepts.
`add_datetime_field(name, *, stored=True, indexed=True)`	UTC datetime field.
`add_hnsw_field(name, dimension, *, distance="cosine", m=16, ef_construction=200, embedder=None)`	HNSW approximate nearest-neighbor vector field.
`add_flat_field(name, dimension, *, distance="cosine", embedder=None)`	Flat (brute-force) vector field.
`add_ivf_field(name, dimension, *, distance="cosine", n_clusters=100, n_probe=1, embedder=None)`	IVF approximate nearest-neighbor vector field.

Other methods

Method	Description
`add_embedder(name, config)`	Register a named embedder definition. `config` is a dict with a `"type"` key (see below).
`set_default_fields(fields)`	Set default search fields (list of strings).
`set_dynamic_field_policy(policy)`	Set how undeclared fields are handled. `policy` is `"strict"`, `"dynamic"` (default), or `"ignore"`. See notes below.
`dynamic_field_policy()`	Return the current policy as a lowercase string.
`field_names()`	Return all field names.

Dynamic field policy

Controls what happens when a document is ingested with field names that are not declared in the schema:

"strict" — Reject the document.
"dynamic" (default) — Infer a type for each undeclared field and add it to the schema. Warning: integer fields silently truncate incoming float values (3.14 → 3). Use "strict" if you need to reject such type mismatches.
"ignore" — Silently drop the undeclared fields.

See Schema & Fields for the full behaviour matrix.

Embedder types

`"type"`	Required keys	Feature flag
`"precomputed"`	–	(always available)
`"candle_bert"`	`"model"`	`embeddings-candle`
`"candle_clip"`	`"model"`	`embeddings-multimodal`
`"openai"`	`"model"`	`embeddings-openai`

Distance metrics

Value	Description
`"cosine"`	Cosine similarity (default)
`"euclidean"`	Euclidean distance
`"dot_product"`	Dot product
`"manhattan"`	Manhattan distance
`"angular"`	Angular distance

Query classes

TermQuery

TermQuery(field: str, term: str)

Matches documents containing the exact term in the given field.

PhraseQuery

PhraseQuery(field: str, terms: list[str])

Matches documents containing the terms in order.

FuzzyQuery

FuzzyQuery(field: str, term: str, *, max_edits: int = 2)

Approximate match allowing up to max_edits edit-distance errors. max_edits is keyword-only.

WildcardQuery

WildcardQuery(field: str, pattern: str)

Pattern match. * matches any sequence of characters, ? matches any single character.

NumericRangeQuery

NumericRangeQuery(field: str, *, min: int | float | None = None, max: int | float | None = None)

Matches numeric values in the range [min, max]. Pass None (or omit) for an open bound. min and max are keyword-only. The numeric type (integer or float) is inferred from the Python type of min/max.

GeoDistanceQuery

GeoDistanceQuery.within_radius(
    field: str, lat: float, lon: float, distance_m: float,
)

Geo-distance (radius) search. Returns documents whose (lat, lon) coordinate is within distance_m metres of the given point.

GeoBoundingBoxQuery

GeoBoundingBoxQuery.within_bounding_box(
    field: str,
    min_lat: float, min_lon: float,
    max_lat: float, max_lon: float,
)

Geo bounding-box search. Returns documents whose (lat, lon) coordinate lies inside the axis-aligned [min_lat, max_lat] × [min_lon, max_lon] rectangle.

Geo3dDistanceQuery

Geo3dDistanceQuery.within_sphere(
    field: str, x: float, y: float, z: float, distance_m: float,
)

Sphere search over a 3D ECEF point field. Returns documents whose (x, y, z) coordinate is within distance_m metres of the centre. See Geo3d concepts for ECEF theory.

Geo3dBoundingBoxQuery

Geo3dBoundingBoxQuery.within_box(
    field: str,
    min_x: float, min_y: float, min_z: float,
    max_x: float, max_y: float, max_z: float,
)

Axis-aligned 3D bounding-box search. Returns documents whose ECEF point lies inside [min_x, max_x] × [min_y, max_y] × [min_z, max_z].

Geo3dNearestQuery

Geo3dNearestQuery.k_nearest(
    field: str,
    x: float, y: float, z: float,
    k: int,
    *,
    initial_radius_m: float | None = None,
    max_radius_m: float | None = None,
)

k-nearest-neighbour search over a 3D ECEF point field. Returns the k documents closest to (x, y, z). The optional initial_radius_m and max_radius_m tune the iterative-expansion search cone.

BooleanQuery

bq = BooleanQuery()
bq.must(query)
bq.should(query)
bq.must_not(query)

Compound boolean query. Construct with no arguments and add clauses one at a time via the must / should / must_not methods. Each method accepts any query object (including a nested BooleanQuery).

must clauses all have to match; must_not clauses must not match. should clauses contribute to scoring; at least one of them must match if there are no must clauses.

SpanQuery

# Single term
SpanQuery.term(field: str, term: str)

# Near: terms appearing within `slop` positions of each other
SpanQuery.near(field: str, terms: list[str], *, slop: int = 0, ordered: bool = True)

# Near with nested SpanQuery clauses
SpanQuery.near_spans(field: str, clauses: list[SpanQuery], *, slop: int = 0, ordered: bool = True)

# Containing: big span contains little span
SpanQuery.containing(field: str, big: SpanQuery, little: SpanQuery)

# Within: include span within exclude span at max distance
SpanQuery.within(field: str, include: SpanQuery, exclude: SpanQuery, distance: int)

Positional / proximity span queries. Construct via the static factory methods. near takes a list of term strings, while near_spans takes a list of SpanQuery objects for nested expressions. slop and ordered are keyword-only.

VectorQuery

VectorQuery(field: str, vector: list[float])

Approximate nearest-neighbor search using a pre-computed embedding vector.

VectorTextQuery

VectorTextQuery(field: str, text: str)

Converts text to an embedding at query time and runs vector search. Requires an embedder configured on the index.

SearchRequest

Full-featured search request for advanced control.

class SearchRequest:
    def __init__(
        self,
        *,
        query=None,
        lexical_query=None,
        vector_query=None,
        filter_query=None,
        fusion=None,
        limit: int = 10,
        offset: int = 0,
    ) -> None: ...

Parameter	Description
`query`	A DSL string or single query object. Mutually exclusive with `lexical_query` / `vector_query`.
`lexical_query`	Lexical component for explicit hybrid search.
`vector_query`	Vector component for explicit hybrid search.
`filter_query`	Lexical filter applied after scoring.
`fusion`	Fusion algorithm (`RRF` or `WeightedSum`). Defaults to `RRF(k=60)` when both components are set.
`limit`	Maximum number of results (default 10).
`offset`	Pagination offset (default 0).

SearchResult

Returned by Index.search().

class SearchResult:
    id: str          # External document identifier
    score: float     # Relevance score
    document: dict | None  # Retrieved field values, or None if not stored

Fusion algorithms

RRF

RRF(k: float = 60.0)

Reciprocal Rank Fusion. Merges lexical and vector result lists by rank position. k is a smoothing constant; higher values reduce the influence of top-ranked results.

WeightedSum

WeightedSum(lexical_weight: float = 0.5, vector_weight: float = 0.5)

Normalises both score lists independently, then combines them as lexical_weight * lexical_score + vector_weight * vector_score.

Text analysis

SynonymDictionary

class SynonymDictionary:
    def __init__(self) -> None: ...
    def add_synonym_group(self, synonyms: list[str]) -> None: ...

WhitespaceTokenizer

class WhitespaceTokenizer:
    def __init__(self) -> None: ...
    def tokenize(self, text: str) -> list[Token]: ...

SynonymGraphFilter

class SynonymGraphFilter:
    def __init__(
        self,
        dictionary: SynonymDictionary,
        keep_original: bool = True,
        boost: float = 1.0,
    ) -> None: ...
    def apply(self, tokens: list[Token]) -> list[Token]: ...

Token

class Token:
    text: str
    position: int
    start_offset: int
    end_offset: int
    boost: float
    stopped: bool
    position_increment: int
    position_length: int

Field value types

Python values are automatically converted to Laurus DataValue types:

Python type	Laurus type	Notes
`None`	`Null`
`bool`	`Bool`	Checked before `int`
`int`	`Int64`
`float`	`Float64`
`str`	`Text`
`bytes`	`Bytes`
`list[float]`	`Vector`	Elements coerced to `f32`
`(lat, lon)` tuple	`Geo`	Two `float` values
`datetime.datetime`	`DateTime`	Converted via `isoformat()`

Keyboard shortcuts

Laurus Documentation