Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

API Reference

Index

The primary entry point. Wraps the Laurus search engine.

class Index:
    def __init__(self, path: str | None = None, schema: Schema | None = None) -> None: ...

Constructor

ParameterTypeDefaultDescription
pathstr | NoneNoneDirectory path for persistent storage. None creates an in-memory index.
schemaSchema | NoneNoneSchema definition. An empty schema is used when omitted.

Methods

MethodDescription
put_document(id, doc)Upsert a document. Replaces all existing versions with the same ID.
add_document(id, doc)Append a document chunk without removing existing versions.
get_documents(id) -> list[dict]Return all stored versions for the given ID.
delete_documents(id)Delete all versions for the given ID.
commit()Flush buffered writes and make all pending changes searchable.
search(query, *, limit=10, offset=0) -> list[SearchResult]Execute a search query.
stats() -> dictReturn index statistics (document_count, vector_fields).

search query argument

The query parameter accepts any of the following:

  • A DSL string (e.g. "title:hello", "embedding:\"memory safety\"")
  • A lexical query object (TermQuery, PhraseQuery, BooleanQuery, …)
  • A vector query object (VectorQuery, VectorTextQuery)
  • A SearchRequest for full control

Schema

Defines the fields and index types for an Index.

class Schema:
    def __init__(self) -> None: ...

Field methods

MethodDescription
add_text_field(name, *, stored=True, indexed=True, term_vectors=False, analyzer=None)Full-text field (inverted index, BM25). analyzer accepts a built-in name ("standard", "english", "keyword", "simple", "noop", or any custom name registered via add_analyzer) or a dict configuring a parameterised preset such as {"language": "japanese", "mode": "normal", "dict": "/var/lib/lindera/ipadic"}. The bare string "japanese" is rejected because the preset requires a Lindera dictionary path.
add_integer_field(name, *, stored=True, indexed=True, multi_valued=False)64-bit integer field. Set multi_valued=True to accept arrays of integers (range queries match if any value satisfies the predicate).
add_float_field(name, *, stored=True, indexed=True, multi_valued=False)64-bit float field. Set multi_valued=True to accept arrays of floats (range queries match if any value satisfies the predicate).
add_boolean_field(name, *, stored=True, indexed=True)Boolean field.
add_bytes_field(name, *, stored=True)Raw bytes field.
add_geo_field(name, *, stored=True, indexed=True)Geographic coordinate field (lat/lon).
add_geo3d_field(name, *, stored=True, indexed=True)3D ECEF Cartesian point field (x, y, z in metres). See Geo3d concepts.
add_datetime_field(name, *, stored=True, indexed=True)UTC datetime field.
add_hnsw_field(name, dimension, *, distance="cosine", m=16, ef_construction=200, embedder=None)HNSW approximate nearest-neighbor vector field.
add_flat_field(name, dimension, *, distance="cosine", embedder=None)Flat (brute-force) vector field.
add_ivf_field(name, dimension, *, distance="cosine", n_clusters=100, n_probe=1, embedder=None)IVF approximate nearest-neighbor vector field.

Other methods

MethodDescription
add_embedder(name, config)Register a named embedder definition. config is a dict with a "type" key (see below).
set_default_fields(fields)Set default search fields (list of strings).
set_dynamic_field_policy(policy)Set how undeclared fields are handled. policy is "strict", "dynamic" (default), or "ignore". See notes below.
dynamic_field_policy()Return the current policy as a lowercase string.
field_names()Return all field names.

Dynamic field policy

Controls what happens when a document is ingested with field names that are not declared in the schema:

  • "strict" — Reject the document.
  • "dynamic" (default) — Infer a type for each undeclared field and add it to the schema. Warning: integer fields silently truncate incoming float values (3.143). Use "strict" if you need to reject such type mismatches.
  • "ignore" — Silently drop the undeclared fields.

See Schema & Fields for the full behaviour matrix.

Embedder types

"type"Required keysFeature flag
"precomputed"(always available)
"candle_bert""model"embeddings-candle
"candle_clip""model"embeddings-multimodal
"openai""model"embeddings-openai

Distance metrics

ValueDescription
"cosine"Cosine similarity (default)
"euclidean"Euclidean distance
"dot_product"Dot product
"manhattan"Manhattan distance
"angular"Angular distance

Query classes

TermQuery

TermQuery(field: str, term: str)

Matches documents containing the exact term in the given field.

PhraseQuery

PhraseQuery(field: str, terms: list[str])

Matches documents containing the terms in order.

FuzzyQuery

FuzzyQuery(field: str, term: str, *, max_edits: int = 2)

Approximate match allowing up to max_edits edit-distance errors. max_edits is keyword-only.

WildcardQuery

WildcardQuery(field: str, pattern: str)

Pattern match. * matches any sequence of characters, ? matches any single character.

NumericRangeQuery

NumericRangeQuery(field: str, *, min: int | float | None = None, max: int | float | None = None)

Matches numeric values in the range [min, max]. Pass None (or omit) for an open bound. min and max are keyword-only. The numeric type (integer or float) is inferred from the Python type of min/max.

GeoDistanceQuery

GeoDistanceQuery.within_radius(
    field: str, lat: float, lon: float, distance_m: float,
)

Geo-distance (radius) search. Returns documents whose (lat, lon) coordinate is within distance_m metres of the given point.

GeoBoundingBoxQuery

GeoBoundingBoxQuery.within_bounding_box(
    field: str,
    min_lat: float, min_lon: float,
    max_lat: float, max_lon: float,
)

Geo bounding-box search. Returns documents whose (lat, lon) coordinate lies inside the axis-aligned [min_lat, max_lat] × [min_lon, max_lon] rectangle.

Geo3dDistanceQuery

Geo3dDistanceQuery.within_sphere(
    field: str, x: float, y: float, z: float, distance_m: float,
)

Sphere search over a 3D ECEF point field. Returns documents whose (x, y, z) coordinate is within distance_m metres of the centre. See Geo3d concepts for ECEF theory.

Geo3dBoundingBoxQuery

Geo3dBoundingBoxQuery.within_box(
    field: str,
    min_x: float, min_y: float, min_z: float,
    max_x: float, max_y: float, max_z: float,
)

Axis-aligned 3D bounding-box search. Returns documents whose ECEF point lies inside [min_x, max_x] × [min_y, max_y] × [min_z, max_z].

Geo3dNearestQuery

Geo3dNearestQuery.k_nearest(
    field: str,
    x: float, y: float, z: float,
    k: int,
    *,
    initial_radius_m: float | None = None,
    max_radius_m: float | None = None,
)

k-nearest-neighbour search over a 3D ECEF point field. Returns the k documents closest to (x, y, z). The optional initial_radius_m and max_radius_m tune the iterative-expansion search cone.

BooleanQuery

bq = BooleanQuery()
bq.must(query)
bq.should(query)
bq.must_not(query)

Compound boolean query. Construct with no arguments and add clauses one at a time via the must / should / must_not methods. Each method accepts any query object (including a nested BooleanQuery).

must clauses all have to match; must_not clauses must not match. should clauses contribute to scoring; at least one of them must match if there are no must clauses.

SpanQuery

# Single term
SpanQuery.term(field: str, term: str)

# Near: terms appearing within `slop` positions of each other
SpanQuery.near(field: str, terms: list[str], *, slop: int = 0, ordered: bool = True)

# Near with nested SpanQuery clauses
SpanQuery.near_spans(field: str, clauses: list[SpanQuery], *, slop: int = 0, ordered: bool = True)

# Containing: big span contains little span
SpanQuery.containing(field: str, big: SpanQuery, little: SpanQuery)

# Within: include span within exclude span at max distance
SpanQuery.within(field: str, include: SpanQuery, exclude: SpanQuery, distance: int)

Positional / proximity span queries. Construct via the static factory methods. near takes a list of term strings, while near_spans takes a list of SpanQuery objects for nested expressions. slop and ordered are keyword-only.

VectorQuery

VectorQuery(field: str, vector: list[float])

Approximate nearest-neighbor search using a pre-computed embedding vector.

VectorTextQuery

VectorTextQuery(field: str, text: str)

Converts text to an embedding at query time and runs vector search. Requires an embedder configured on the index.


SearchRequest

Full-featured search request for advanced control.

class SearchRequest:
    def __init__(
        self,
        *,
        query=None,
        lexical_query=None,
        vector_query=None,
        filter_query=None,
        fusion=None,
        limit: int = 10,
        offset: int = 0,
    ) -> None: ...
ParameterDescription
queryA DSL string or single query object. Mutually exclusive with lexical_query / vector_query.
lexical_queryLexical component for explicit hybrid search.
vector_queryVector component for explicit hybrid search.
filter_queryLexical filter applied after scoring.
fusionFusion algorithm (RRF or WeightedSum). Defaults to RRF(k=60) when both components are set.
limitMaximum number of results (default 10).
offsetPagination offset (default 0).

SearchResult

Returned by Index.search().

class SearchResult:
    id: str          # External document identifier
    score: float     # Relevance score
    document: dict | None  # Retrieved field values, or None if not stored

Fusion algorithms

RRF

RRF(k: float = 60.0)

Reciprocal Rank Fusion. Merges lexical and vector result lists by rank position. k is a smoothing constant; higher values reduce the influence of top-ranked results.

WeightedSum

WeightedSum(lexical_weight: float = 0.5, vector_weight: float = 0.5)

Normalises both score lists independently, then combines them as lexical_weight * lexical_score + vector_weight * vector_score.


Text analysis

SynonymDictionary

class SynonymDictionary:
    def __init__(self) -> None: ...
    def add_synonym_group(self, synonyms: list[str]) -> None: ...

WhitespaceTokenizer

class WhitespaceTokenizer:
    def __init__(self) -> None: ...
    def tokenize(self, text: str) -> list[Token]: ...

SynonymGraphFilter

class SynonymGraphFilter:
    def __init__(
        self,
        dictionary: SynonymDictionary,
        keep_original: bool = True,
        boost: float = 1.0,
    ) -> None: ...
    def apply(self, tokens: list[Token]) -> list[Token]: ...

Token

class Token:
    text: str
    position: int
    start_offset: int
    end_offset: int
    boost: float
    stopped: bool
    position_increment: int
    position_length: int

Field value types

Python values are automatically converted to Laurus DataValue types:

Python typeLaurus typeNotes
NoneNull
boolBoolChecked before int
intInt64
floatFloat64
strText
bytesBytes
list[float]VectorElements coerced to f32
(lat, lon) tupleGeoTwo float values
datetime.datetimeDateTimeConverted via isoformat()