API Reference
Index
The primary entry point. Wraps the Laurus search engine.
class Index:
def __init__(self, path: str | None = None, schema: Schema | None = None) -> None: ...
Constructor
| Parameter | Type | Default | Description |
|---|---|---|---|
path | str | None | None | Directory path for persistent storage. None creates an in-memory index. |
schema | Schema | None | None | Schema definition. An empty schema is used when omitted. |
Methods
| Method | Description |
|---|---|
put_document(id, doc) | Upsert a document. Replaces all existing versions with the same ID. |
add_document(id, doc) | Append a document chunk without removing existing versions. |
get_documents(id) -> list[dict] | Return all stored versions for the given ID. |
delete_documents(id) | Delete all versions for the given ID. |
commit() | Flush buffered writes and make all pending changes searchable. |
search(query, *, limit=10, offset=0) -> list[SearchResult] | Execute a search query. |
stats() -> dict | Return index statistics (document_count, vector_fields). |
search query argument
The query parameter accepts any of the following:
- A DSL string (e.g.
"title:hello","embedding:\"memory safety\"") - A lexical query object (
TermQuery,PhraseQuery,BooleanQuery, …) - A vector query object (
VectorQuery,VectorTextQuery) - A
SearchRequestfor full control
Schema
Defines the fields and index types for an Index.
class Schema:
def __init__(self) -> None: ...
Field methods
| Method | Description |
|---|---|
add_text_field(name, *, stored=True, indexed=True, term_vectors=False, analyzer=None) | Full-text field (inverted index, BM25). analyzer accepts a built-in name ("standard", "english", "keyword", "simple", "noop", or any custom name registered via add_analyzer) or a dict configuring a parameterised preset such as {"language": "japanese", "mode": "normal", "dict": "/var/lib/lindera/ipadic"}. The bare string "japanese" is rejected because the preset requires a Lindera dictionary path. |
add_integer_field(name, *, stored=True, indexed=True, multi_valued=False) | 64-bit integer field. Set multi_valued=True to accept arrays of integers (range queries match if any value satisfies the predicate). |
add_float_field(name, *, stored=True, indexed=True, multi_valued=False) | 64-bit float field. Set multi_valued=True to accept arrays of floats (range queries match if any value satisfies the predicate). |
add_boolean_field(name, *, stored=True, indexed=True) | Boolean field. |
add_bytes_field(name, *, stored=True) | Raw bytes field. |
add_geo_field(name, *, stored=True, indexed=True) | Geographic coordinate field (lat/lon). |
add_geo3d_field(name, *, stored=True, indexed=True) | 3D ECEF Cartesian point field (x, y, z in metres). See Geo3d concepts. |
add_datetime_field(name, *, stored=True, indexed=True) | UTC datetime field. |
add_hnsw_field(name, dimension, *, distance="cosine", m=16, ef_construction=200, embedder=None) | HNSW approximate nearest-neighbor vector field. |
add_flat_field(name, dimension, *, distance="cosine", embedder=None) | Flat (brute-force) vector field. |
add_ivf_field(name, dimension, *, distance="cosine", n_clusters=100, n_probe=1, embedder=None) | IVF approximate nearest-neighbor vector field. |
Other methods
| Method | Description |
|---|---|
add_embedder(name, config) | Register a named embedder definition. config is a dict with a "type" key (see below). |
set_default_fields(fields) | Set default search fields (list of strings). |
set_dynamic_field_policy(policy) | Set how undeclared fields are handled. policy is "strict", "dynamic" (default), or "ignore". See notes below. |
dynamic_field_policy() | Return the current policy as a lowercase string. |
field_names() | Return all field names. |
Dynamic field policy
Controls what happens when a document is ingested with field names that are not declared in the schema:
"strict"— Reject the document."dynamic"(default) — Infer a type for each undeclared field and add it to the schema. Warning: integer fields silently truncate incoming float values (3.14→3). Use"strict"if you need to reject such type mismatches."ignore"— Silently drop the undeclared fields.
See Schema & Fields for the full behaviour matrix.
Embedder types
"type" | Required keys | Feature flag |
|---|---|---|
"precomputed" | – | (always available) |
"candle_bert" | "model" | embeddings-candle |
"candle_clip" | "model" | embeddings-multimodal |
"openai" | "model" | embeddings-openai |
Distance metrics
| Value | Description |
|---|---|
"cosine" | Cosine similarity (default) |
"euclidean" | Euclidean distance |
"dot_product" | Dot product |
"manhattan" | Manhattan distance |
"angular" | Angular distance |
Query classes
TermQuery
TermQuery(field: str, term: str)
Matches documents containing the exact term in the given field.
PhraseQuery
PhraseQuery(field: str, terms: list[str])
Matches documents containing the terms in order.
FuzzyQuery
FuzzyQuery(field: str, term: str, *, max_edits: int = 2)
Approximate match allowing up to max_edits edit-distance errors. max_edits is keyword-only.
WildcardQuery
WildcardQuery(field: str, pattern: str)
Pattern match. * matches any sequence of characters, ? matches any single character.
NumericRangeQuery
NumericRangeQuery(field: str, *, min: int | float | None = None, max: int | float | None = None)
Matches numeric values in the range [min, max]. Pass None (or omit) for
an open bound. min and max are keyword-only. The numeric type (integer or
float) is inferred from the Python type of min/max.
GeoDistanceQuery
GeoDistanceQuery.within_radius(
field: str, lat: float, lon: float, distance_m: float,
)
Geo-distance (radius) search. Returns documents whose (lat, lon) coordinate
is within distance_m metres of the given point.
GeoBoundingBoxQuery
GeoBoundingBoxQuery.within_bounding_box(
field: str,
min_lat: float, min_lon: float,
max_lat: float, max_lon: float,
)
Geo bounding-box search. Returns documents whose (lat, lon) coordinate lies
inside the axis-aligned [min_lat, max_lat] × [min_lon, max_lon] rectangle.
Geo3dDistanceQuery
Geo3dDistanceQuery.within_sphere(
field: str, x: float, y: float, z: float, distance_m: float,
)
Sphere search over a 3D ECEF point field. Returns documents whose (x, y, z)
coordinate is within distance_m metres of the centre. See
Geo3d concepts for ECEF theory.
Geo3dBoundingBoxQuery
Geo3dBoundingBoxQuery.within_box(
field: str,
min_x: float, min_y: float, min_z: float,
max_x: float, max_y: float, max_z: float,
)
Axis-aligned 3D bounding-box search. Returns documents whose ECEF point lies
inside [min_x, max_x] × [min_y, max_y] × [min_z, max_z].
Geo3dNearestQuery
Geo3dNearestQuery.k_nearest(
field: str,
x: float, y: float, z: float,
k: int,
*,
initial_radius_m: float | None = None,
max_radius_m: float | None = None,
)
k-nearest-neighbour search over a 3D ECEF point field. Returns the k
documents closest to (x, y, z). The optional initial_radius_m and
max_radius_m tune the iterative-expansion search cone.
BooleanQuery
bq = BooleanQuery()
bq.must(query)
bq.should(query)
bq.must_not(query)
Compound boolean query. Construct with no arguments and add clauses one at a
time via the must / should / must_not methods. Each method accepts any
query object (including a nested BooleanQuery).
must clauses all have to match; must_not clauses must not match.
should clauses contribute to scoring; at least one of them must match if
there are no must clauses.
SpanQuery
# Single term
SpanQuery.term(field: str, term: str)
# Near: terms appearing within `slop` positions of each other
SpanQuery.near(field: str, terms: list[str], *, slop: int = 0, ordered: bool = True)
# Near with nested SpanQuery clauses
SpanQuery.near_spans(field: str, clauses: list[SpanQuery], *, slop: int = 0, ordered: bool = True)
# Containing: big span contains little span
SpanQuery.containing(field: str, big: SpanQuery, little: SpanQuery)
# Within: include span within exclude span at max distance
SpanQuery.within(field: str, include: SpanQuery, exclude: SpanQuery, distance: int)
Positional / proximity span queries. Construct via the static factory
methods. near takes a list of term strings, while near_spans takes a
list of SpanQuery objects for nested expressions. slop and ordered
are keyword-only.
VectorQuery
VectorQuery(field: str, vector: list[float])
Approximate nearest-neighbor search using a pre-computed embedding vector.
VectorTextQuery
VectorTextQuery(field: str, text: str)
Converts text to an embedding at query time and runs vector search. Requires an embedder configured on the index.
SearchRequest
Full-featured search request for advanced control.
class SearchRequest:
def __init__(
self,
*,
query=None,
lexical_query=None,
vector_query=None,
filter_query=None,
fusion=None,
limit: int = 10,
offset: int = 0,
) -> None: ...
| Parameter | Description |
|---|---|
query | A DSL string or single query object. Mutually exclusive with lexical_query / vector_query. |
lexical_query | Lexical component for explicit hybrid search. |
vector_query | Vector component for explicit hybrid search. |
filter_query | Lexical filter applied after scoring. |
fusion | Fusion algorithm (RRF or WeightedSum). Defaults to RRF(k=60) when both components are set. |
limit | Maximum number of results (default 10). |
offset | Pagination offset (default 0). |
SearchResult
Returned by Index.search().
class SearchResult:
id: str # External document identifier
score: float # Relevance score
document: dict | None # Retrieved field values, or None if not stored
Fusion algorithms
RRF
RRF(k: float = 60.0)
Reciprocal Rank Fusion. Merges lexical and vector result lists by rank position. k is a smoothing constant; higher values reduce the influence of top-ranked results.
WeightedSum
WeightedSum(lexical_weight: float = 0.5, vector_weight: float = 0.5)
Normalises both score lists independently, then combines them as lexical_weight * lexical_score + vector_weight * vector_score.
Text analysis
SynonymDictionary
class SynonymDictionary:
def __init__(self) -> None: ...
def add_synonym_group(self, synonyms: list[str]) -> None: ...
WhitespaceTokenizer
class WhitespaceTokenizer:
def __init__(self) -> None: ...
def tokenize(self, text: str) -> list[Token]: ...
SynonymGraphFilter
class SynonymGraphFilter:
def __init__(
self,
dictionary: SynonymDictionary,
keep_original: bool = True,
boost: float = 1.0,
) -> None: ...
def apply(self, tokens: list[Token]) -> list[Token]: ...
Token
class Token:
text: str
position: int
start_offset: int
end_offset: int
boost: float
stopped: bool
position_increment: int
position_length: int
Field value types
Python values are automatically converted to Laurus DataValue types:
| Python type | Laurus type | Notes |
|---|---|---|
None | Null | |
bool | Bool | Checked before int |
int | Int64 | |
float | Float64 | |
str | Text | |
bytes | Bytes | |
list[float] | Vector | Elements coerced to f32 |
(lat, lon) tuple | Geo | Two float values |
datetime.datetime | DateTime | Converted via isoformat() |