Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Model File Format

Litsea models are stored as simple plain-text files.

Format Specification

<feature_name>\t<weight>
<feature_name>\t<weight>
...
<bias>
  • Each line (except the last) contains a feature name and its weight, separated by a tab character
  • Zero-weight features are omitted to keep the file compact
  • The last line contains the bias term as a single number

Example

BC1:IK	0.3456
BC2:KI	-0.1234
UW4:は	0.5678
UC4:I	0.2345
...
-0.0891

Bias Reconstruction

When loading a model, the bias is reconstructed using:

bias_bucket_weight = -bias_value * 2 - sum(all_feature_weights)

During prediction:

bias = -sum(all_model_weights) / 2.0
score = bias + sum(model[feature] for feature in input_attributes)

File Size

Model files are very compact:

ModelSizeFeatures
japanese.model~2.9 KBWikipedia-trained
korean.model~1.8 KBWikipedia-trained
chinese.model~1.3 KBWikipedia-trained
RWCP.model~22 KBOriginal TinySegmenter
JEITA_Genpaku_ChaSen_IPAdic.model~17 KBJEITA corpus

The compact size is a key advantage of Litsea – models can be embedded directly in applications or served over HTTP with minimal overhead.

Compatibility

  • Model files are encoding-agnostic (feature names are stored as-is)
  • The format is deterministic (features are sorted via BTreeMap)
  • Models are forward-compatible – new features in the input that are not in the model are simply ignored during prediction