Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

API Reference

Index

The primary entry point. Wraps the Laurus search engine.

new \Laurus\Index(?string $path = null, ?Schema $schema = null)

Constructor

ParameterTypeDefaultDescription
$pathstring|nullnullDirectory path for persistent storage. null creates an in-memory index.
$schemaSchema|nullnullSchema definition. An empty schema is used when omitted.

Methods

MethodDescription
putDocument(string $id, array $doc): voidUpsert a document. Replaces all existing versions with the same ID.
addDocument(string $id, array $doc): voidAppend a document chunk without removing existing versions.
getDocuments(string $id): arrayReturn all stored versions for the given ID.
deleteDocuments(string $id): voidDelete all versions for the given ID.
commit(): voidFlush buffered writes and make all pending changes searchable.
search(mixed $query, int $limit = 10, int $offset = 0): arrayExecute a search query. Returns an array of SearchResult.
searchBatch(array $queries, int $limit = 10, int $offset = 0): arrayExecute multiple independent searches in one call. Each query is dispatched in parallel on the underlying tokio runtime. results[i] corresponds to queries[i]. Returns an array of arrays of SearchResult. Empty input returns [].
stats(): arrayReturn index statistics ("documentCount", "vectorFields").

search query argument

The $query parameter accepts any of the following:

  • A DSL string (e.g. "title:hello", "embedding:\"memory safety\"")
  • A lexical query object (TermQuery, PhraseQuery, BooleanQuery, …)
  • A vector query object (VectorQuery, VectorTextQuery)
  • A SearchRequest for full control

The same value kinds are accepted as the elements of searchBatch’s $queries array — DSL strings, query objects, and SearchRequest instances may be mixed within a single batch.


Schema

Defines the fields and index types for an Index.

new \Laurus\Schema()

Field methods

MethodDescription
addTextField(string $name, bool $stored = true, bool $indexed = true, bool $termVectors = false, ?string $analyzer = null): voidFull-text field (inverted index, BM25). $analyzer is the name of a parameter-less built-in ("standard", "english", "keyword", "simple", "noop") or a custom name registered via addAnalyzer. The Japanese preset requires a Lindera dictionary path, so register it as a custom analyzer with a lindera tokenizer and reference it by name.
addIntegerField(string $name, bool $stored = true, bool $indexed = true, bool $multiValued = false): void64-bit integer field. Pass $multiValued = true to accept arrays of integers (range queries match if any value satisfies the predicate).
addFloatField(string $name, bool $stored = true, bool $indexed = true, bool $multiValued = false): void64-bit float field. Pass $multiValued = true to accept arrays of floats (range queries match if any value satisfies the predicate).
addBooleanField(string $name, bool $stored = true, bool $indexed = true): voidBoolean field.
addBytesField(string $name, bool $stored = true): voidRaw bytes field.
addGeoField(string $name, bool $stored = true, bool $indexed = true): voidGeographic coordinate field (lat/lon).
addGeo3dField(string $name, bool $stored = true, bool $indexed = true): void3D ECEF Cartesian point field (x, y, z in metres). See Geo3d concepts.
addDatetimeField(string $name, bool $stored = true, bool $indexed = true): voidUTC datetime field.
addHnswField(string $name, int $dimension, ?string $distance = "cosine", int $m = 16, int $efConstruction = 200, ?string $embedder = null): voidHNSW approximate nearest-neighbor vector field.
addFlatField(string $name, int $dimension, ?string $distance = "cosine", ?string $embedder = null): voidFlat (brute-force) vector field.
addIvfField(string $name, int $dimension, ?string $distance = "cosine", int $nClusters = 100, int $nProbe = 1, ?string $embedder = null): voidIVF approximate nearest-neighbor vector field.

Other methods

MethodDescription
addEmbedder(string $name, array $config): voidRegister a named embedder definition. $config is an associative array with a "type" key (see below).
setDefaultFields(array $fields): voidSet the default fields used when no field is specified in a query. $fields is an array of strings.
setDynamicFieldPolicy(string $policy): voidSet how undeclared fields are handled. $policy is "strict", "dynamic" (default), or "ignore". See notes below.
dynamicFieldPolicy(): stringReturn the current policy as a lowercase string.
fieldNames(): arrayReturn the list of field names defined in this schema.

Dynamic field policy

Controls what happens when a document is ingested with field names that are not declared in the schema:

  • "strict" — Reject the document.
  • "dynamic" (default) — Infer a type for each undeclared field and add it to the schema. Warning: integer fields silently truncate incoming float values (3.143). Use "strict" if you need to reject such type mismatches.
  • "ignore" — Silently drop the undeclared fields.

See Schema & Fields for the full behaviour matrix.

Embedder types

"type"Required keysFeature flag
"precomputed"(always available)
"candle_bert""model"embeddings-candle
"candle_clip""model"embeddings-multimodal
"openai""model"embeddings-openai

Distance metrics

ValueDescription
"cosine"Cosine similarity (default)
"euclidean"Euclidean distance
"dot_product"Dot product
"manhattan"Manhattan distance
"angular"Angular distance

Query classes

TermQuery

new \Laurus\TermQuery(string $field, string $term)

Matches documents containing the exact term in the given field.

PhraseQuery

new \Laurus\PhraseQuery(string $field, array $terms)

Matches documents containing the terms in order. $terms is an array of strings.

FuzzyQuery

new \Laurus\FuzzyQuery(string $field, string $term, int $maxEdits = 2)

Approximate match allowing up to $maxEdits edit-distance errors.

WildcardQuery

new \Laurus\WildcardQuery(string $field, string $pattern)

Pattern match. * matches any sequence of characters, ? matches any single character.

NumericRangeQuery

new \Laurus\NumericRangeQuery(string $field, mixed $min, mixed $max, ?string $numericType = "integer")

Matches numeric values in the range [$min, $max]. Pass null for an open bound. Set $numericType to "integer" or "float".

GeoDistanceQuery

\Laurus\GeoDistanceQuery::withinRadius(
    string $field, float $lat, float $lon, float $distanceM,
): GeoDistanceQuery

Geo-distance (radius) search. Returns documents whose (lat, lon) coordinate is within $distanceM metres of the given point.

GeoBoundingBoxQuery

\Laurus\GeoBoundingBoxQuery::withinBoundingBox(
    string $field,
    float $minLat, float $minLon,
    float $maxLat, float $maxLon,
): GeoBoundingBoxQuery

Geo bounding-box search. Returns documents whose (lat, lon) coordinate lies inside the axis-aligned [$minLat, $maxLat] × [$minLon, $maxLon] rectangle.

Geo3dDistanceQuery

\Laurus\Geo3dDistanceQuery::withinSphere(
    string $field,
    float $x, float $y, float $z,
    float $distanceM,
): Geo3dDistanceQuery

Sphere search over a 3D ECEF point field. Returns documents whose (x, y, z) coordinate is within $distanceM metres of the centre. See Geo3d concepts for ECEF theory.

Geo3dBoundingBoxQuery

\Laurus\Geo3dBoundingBoxQuery::withinBox(
    string $field,
    float $minX, float $minY, float $minZ,
    float $maxX, float $maxY, float $maxZ,
): Geo3dBoundingBoxQuery

Axis-aligned 3D bounding-box search.

Geo3dNearestQuery

\Laurus\Geo3dNearestQuery::kNearest(
    string $field,
    float $x, float $y, float $z,
    int $k,
    ?float $initialRadiusM = null,
    ?float $maxRadiusM = null,
): Geo3dNearestQuery

k-nearest-neighbour search over a 3D ECEF point field. The optional $initialRadiusM and $maxRadiusM parameters tune the iterative-expansion search cone.

BooleanQuery

$bq = new \Laurus\BooleanQuery();
$bq->must($query);
$bq->should($query);
$bq->mustNot($query);

Compound boolean query. must clauses all have to match; mustNot clauses must not match. should clauses contribute to scoring; at least one of them must match if there are no must clauses.

SpanQuery

// Single term
\Laurus\SpanQuery::term(string $field, string $term): SpanQuery

// Near: terms within slop positions
\Laurus\SpanQuery::near(string $field, array $terms, int $slop = 0, bool $ordered = true): SpanQuery

// NearSpans: nested SpanQuery clauses within slop positions
\Laurus\SpanQuery::nearSpans(string $field, array $clauses, int $slop = 0, bool $ordered = true): SpanQuery

// Containing: big span contains little span
\Laurus\SpanQuery::containing(string $field, SpanQuery $big, SpanQuery $little): SpanQuery

// Within: include span within exclude span at max distance
\Laurus\SpanQuery::within(string $field, SpanQuery $include, SpanQuery $exclude, int $distance): SpanQuery

Positional / proximity span queries. near takes an array of term strings, while nearSpans takes an array of SpanQuery objects for nested expressions (each clause’s field is re-rooted to the outer $field).

VectorQuery

new \Laurus\VectorQuery(string $field, array $vector)

Approximate nearest-neighbor search using a pre-computed embedding vector. $vector is an array of floats.

VectorTextQuery

new \Laurus\VectorTextQuery(string $field, string $text)

Converts $text to an embedding at query time and runs vector search. Requires an embedder configured on the index.


SearchRequest

Full-featured search request for advanced control.

new \Laurus\SearchRequest(
    mixed $query = null,
    mixed $lexicalQuery = null,
    mixed $vectorQuery = null,
    mixed $filterQuery = null,
    mixed $fusion = null,
    int $limit = 10,
    int $offset = 0,
)
ParameterDescription
$queryA DSL string or single query object. Mutually exclusive with $lexicalQuery / $vectorQuery.
$lexicalQueryLexical component for explicit hybrid search.
$vectorQueryVector component for explicit hybrid search.
$filterQueryLexical filter applied after scoring.
$fusionFusion algorithm (RRF or WeightedSum). Defaults to RRF(k: 60) when both components are set.
$limitMaximum number of results (default 10).
$offsetPagination offset (default 0).

SearchResult

Returned by Index->search().

$result->getId()        // string   -- External document identifier
$result->getScore()     // float    -- Relevance score
$result->getDocument()  // array|null -- Retrieved field values, or null if not stored

Fusion algorithms

RRF

new \Laurus\RRF(float $k = 60.0)

Reciprocal Rank Fusion. Merges lexical and vector result lists by rank position. $k is a smoothing constant; higher values reduce the influence of top-ranked results.

WeightedSum

new \Laurus\WeightedSum(float $lexicalWeight = 0.5, float $vectorWeight = 0.5)

Normalises both score lists independently, then combines them as $lexicalWeight * lexical_score + $vectorWeight * vector_score.


Text analysis

SynonymDictionary

$dict = new \Laurus\SynonymDictionary();
$dict->addSynonymGroup(["fast", "quick", "rapid"]);

A dictionary of synonym groups. All terms in a group are treated as synonyms of each other.

WhitespaceTokenizer

$tokenizer = new \Laurus\WhitespaceTokenizer();
$tokens = $tokenizer->tokenize("hello world");

Splits text on whitespace boundaries and returns an array of Token objects.

SynonymGraphFilter

new \Laurus\SynonymGraphFilter(SynonymDictionary $dictionary, bool $keepOriginal = true, float $boost = 1.0)
ParameterDescription
$dictionarySource synonym groups.
$keepOriginalWhen true (default), keep the original token alongside the synonyms.
$boostScore boost applied to the inserted synonym tokens (default 1.0).
$filter = new \Laurus\SynonymGraphFilter($dictionary, true, 1.0);
$expanded = $filter->apply($tokens);

Token filter that expands tokens with their synonyms from a SynonymDictionary.

Token

$token->getText()               // string  -- The token text
$token->getPosition()           // int     -- Position in the token stream
$token->getStartOffset()        // int     -- Character start offset in the original text
$token->getEndOffset()          // int     -- Character end offset in the original text
$token->getBoost()              // float   -- Score boost factor (1.0 = no adjustment)
$token->isStopped()             // bool    -- Whether removed by a stop filter
$token->getPositionIncrement()  // int     -- Difference from the previous token's position
$token->getPositionLength()     // int     -- Number of positions spanned

Field value types

PHP values are automatically converted to Laurus DataValue types:

PHP typeLaurus typeNotes
nullNull
true / falseBool
intInt64
floatFloat64
stringText
array of numericsVectorElements coerced to f32
array with "lat", "lon"GeoTwo float values
array with "x", "y", "z"GeoEcefThree float values, meters (3D ECEF Cartesian)
string (ISO 8601)DateTimeParsed from ISO 8601 format