ID Management
Iris uses a dual-tiered ID management strategy to ensure efficient document retrieval, updates, and aggregation in distributed environments.
1. External ID (String)
The External ID is a logical identifier used by users and applications to uniquely identify a document.
- Type:
String - Role: You can use any unique value, such as UUIDs, URLs, or database primary keys.
- Storage: Persisted transparently as a reserved system field name
_idwithin the Lexical Index. - Uniqueness: Expected to be unique across the entire system.
- Updates: Indexing a document with an existing
external_idtriggers an automatic “Delete-then-Insert” (Upsert) operation, replacing the old version with the newest.
2. Internal ID (u64 / Stable ID)
The Internal ID is a physical handle used internally by Iris’s engines (Lexical and Vector) for high-performance operations.
- Type: Unsigned 64-bit Integer (
u64) - Role: Used for bitmap operations, point references, and routing between distributed nodes.
- Immutability (Stable): Once assigned, an Internal ID never changes due to index merges (segment compaction) or restarts. This prevents inconsistencies in deletion logs and caches.
ID Structure (Shard-Prefixed)
Iris employs a Shard-Prefixed Stable ID scheme designed for multi-node distributed environments.
| Bit Range | Name | Description |
|---|---|---|
| 48-63 bit | Shard ID | Prefix identifying the node or partition (up to 65,535 shards). |
| 0-47 bit | Local ID | Monotonically increasing document number within a shard (up to ~281 trillion documents). |
Why this structure?
- Zero-Cost Aggregation: Since
u64IDs are globally unique, the aggregator can perform fast sorting and deduplication without worrying about ID collisions between nodes. - Fast Routing: The aggregator can immediately identify the physical node responsible for a document just by looking at the upper bits, avoiding expensive hash lookups.
- High-Performance Fetching: Internal IDs map directly to physical data structures. This allows Iris to skip the “External-to-Internal ID” conversion step during retrieval, achieving O(1) access speed.
ID Lifecycle
- Registration (
engine.index()): User provides a document with an External ID. - ID Assignment: The
Enginecombines the currentshard_idwith a new Local ID to issue a Shard-Prefixed Internal ID. - Mapping: The engine maintains the relationship between the External ID and the new Internal ID.
- Search: Search results return the
u64Internal ID for efficiency. - Retrieval/Deletion: While the user-facing API accepts External IDs for convenience, the engine internally converts them to Internal IDs for near-instant processing.