Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Library API Overview

The litsea crate provides a Rust API for word segmentation, model training, and feature extraction.

Installation

[dependencies]
litsea = "0.5.0"

Loading models from local files is synchronous and needs no async runtime. An async runtime such as tokio is only required when loading models over HTTP/HTTPS with the async load_model method.

Module Map

graph LR
    A["litsea::segmenter"] --- B["Segmenter"]
    C["litsea::adaboost"] --- D["AdaBoost"]
    E["litsea::language"] --- F["Language"]
    G["litsea::extractor"] --- H["Extractor"]
    I["litsea::trainer"] --- J["Trainer, PosTrainer"]
    K["litsea::error"] --- L["LitseaError, Result"]
    M["litsea::perceptron"] --- N["AveragedPerceptron"]
    O["litsea::upos"] --- P["Upos, SegmentLabel"]
    Q["litsea::metrics"] --- R["BinaryMetrics, MulticlassMetrics"]
ModulePrimary TypesPurpose
litsea::segmenterSegmenterWord segmentation, joint segmentation with POS tagging
litsea::adaboostAdaBoostBinary classification, model I/O
litsea::perceptronAveragedPerceptronMulticlass classification (POS tagging), model I/O
litsea::uposUpos, SegmentLabelUPOS POS tags, segment labels
litsea::languageLanguageLanguage definitions, character classification
litsea::extractorExtractorFeature extraction from corpus
litsea::trainerTrainer, PosTrainerTraining orchestration
litsea::errorLitseaError, ResultError type and result alias
litsea::metricsBinaryMetrics, MulticlassMetricsEvaluation metrics

All primary types are also re-exported at the crate root, so use litsea::Segmenter; works as a shorthand for use litsea::segmenter::Segmenter;.

Quick Example

use std::path::Path;

use litsea::adaboost::AdaBoost;
use litsea::language::Language;
use litsea::segmenter::Segmenter;

fn main() -> litsea::Result<()> {
    let mut learner = AdaBoost::new(0.01, 100);
    learner.load_model_from_path(Path::new("./models/japanese.model"))?;

    let segmenter = Segmenter::new(Language::Japanese, Some(learner));
    let tokens = segmenter.segment("これはテストです。");

    assert_eq!(tokens, vec!["これ", "は", "テスト", "です", "。"]);
    Ok(())
}

Quick Example (POS Tagging)

use std::path::Path;

use litsea::language::Language;
use litsea::perceptron::AveragedPerceptron;
use litsea::segmenter::Segmenter;

fn main() -> litsea::Result<()> {
    let mut pos_learner = AveragedPerceptron::new();
    pos_learner.load_model_from_path(Path::new("./models/japanese_pos.model"))?;

    let segmenter = Segmenter::with_pos_learner(Language::Japanese, pos_learner);
    let tokens = segmenter.segment_with_pos("これはテストです。");

    for (word, pos) in &tokens {
        print!("{}/{} ", word, pos);
    }
    println!();

    Ok(())
}

API Documentation

Full API documentation is available on docs.rs/litsea.