Library API Overview
The litsea crate provides a Rust API for word segmentation, model training, and feature extraction.
Installation
[dependencies]
litsea = "0.5.0"
Loading models from local files is synchronous and needs no async runtime. An async runtime such as tokio is only required when loading models over HTTP/HTTPS with the async load_model method.
Module Map
graph LR
A["litsea::segmenter"] --- B["Segmenter"]
C["litsea::adaboost"] --- D["AdaBoost"]
E["litsea::language"] --- F["Language"]
G["litsea::extractor"] --- H["Extractor"]
I["litsea::trainer"] --- J["Trainer, PosTrainer"]
K["litsea::error"] --- L["LitseaError, Result"]
M["litsea::perceptron"] --- N["AveragedPerceptron"]
O["litsea::upos"] --- P["Upos, SegmentLabel"]
Q["litsea::metrics"] --- R["BinaryMetrics, MulticlassMetrics"]
| Module | Primary Types | Purpose |
|---|---|---|
litsea::segmenter | Segmenter | Word segmentation, joint segmentation with POS tagging |
litsea::adaboost | AdaBoost | Binary classification, model I/O |
litsea::perceptron | AveragedPerceptron | Multiclass classification (POS tagging), model I/O |
litsea::upos | Upos, SegmentLabel | UPOS POS tags, segment labels |
litsea::language | Language | Language definitions, character classification |
litsea::extractor | Extractor | Feature extraction from corpus |
litsea::trainer | Trainer, PosTrainer | Training orchestration |
litsea::error | LitseaError, Result | Error type and result alias |
litsea::metrics | BinaryMetrics, MulticlassMetrics | Evaluation metrics |
All primary types are also re-exported at the crate root, so use litsea::Segmenter; works as a shorthand for use litsea::segmenter::Segmenter;.
Quick Example
use std::path::Path;
use litsea::adaboost::AdaBoost;
use litsea::language::Language;
use litsea::segmenter::Segmenter;
fn main() -> litsea::Result<()> {
let mut learner = AdaBoost::new(0.01, 100);
learner.load_model_from_path(Path::new("./models/japanese.model"))?;
let segmenter = Segmenter::new(Language::Japanese, Some(learner));
let tokens = segmenter.segment("これはテストです。");
assert_eq!(tokens, vec!["これ", "は", "テスト", "です", "。"]);
Ok(())
}
Quick Example (POS Tagging)
use std::path::Path;
use litsea::language::Language;
use litsea::perceptron::AveragedPerceptron;
use litsea::segmenter::Segmenter;
fn main() -> litsea::Result<()> {
let mut pos_learner = AveragedPerceptron::new();
pos_learner.load_model_from_path(Path::new("./models/japanese_pos.model"))?;
let segmenter = Segmenter::with_pos_learner(Language::Japanese, pos_learner);
let tokens = segmenter.segment_with_pos("これはテストです。");
for (word, pos) in &tokens {
print!("{}/{} ", word, pos);
}
println!();
Ok(())
}
API Documentation
Full API documentation is available on docs.rs/litsea.