Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Workspace Structure

Litsea is organized as a Cargo workspace with two crates and supporting directories.

Directory Layout

litsea/
├── Cargo.toml              # Workspace manifest
├── Cargo.lock              # Dependency lock file
├── Makefile                # Build convenience targets
├── rustfmt.toml            # Rust formatting configuration
├── LICENSE                 # MIT
├── README.md               # Project overview
├── litsea/                 # Core library crate
│   ├── Cargo.toml
│   ├── src/
│   │   ├── lib.rs          # Module declarations and version
│   │   ├── adaboost.rs     # AdaBoost algorithm
│   │   ├── segmenter.rs    # Word segmentation
│   │   ├── extractor.rs    # Feature extraction from corpus
│   │   ├── trainer.rs      # Training orchestration
│   │   ├── language.rs     # Language definitions and char patterns
│   │   └── util.rs         # URI scheme utilities
│   └── benches/
│       └── bench.rs        # Criterion benchmarks
├── litsea-cli/             # CLI binary crate
│   ├── Cargo.toml
│   └── src/
│       └── main.rs         # CLI entry point
├── models/                 # Pre-trained models
│   ├── japanese.model
│   ├── chinese.model
│   ├── korean.model
│   ├── RWCP.model
│   └── JEITA_Genpaku_ChaSen_IPAdic.model
├── resources/              # Sample data and test fixtures
│   └── bocchan.txt         # Sample corpus
├── scripts/                # Corpus preparation utilities
│   ├── download_udtreebank.sh      # Download UD Treebanks (prints CoNLL-U file path)
│   ├── corpus_udtreebank.sh           # Convert CoNLL-U to Litsea corpus format
│   └── wikitexts.sh        # Download and prepare Wikipedia text data
├── docs/                   # mdbook documentation (this book)
└── .github/
    └── workflows/          # CI/CD pipelines
        ├── regression.yml  # Test on push/PR
        ├── release.yml     # Release builds and publishing
        └── periodic.yml    # Weekly stability tests

Crate Details

litsea (Core Library)

The core library provides all segmentation, training, and model I/O functionality.

DependencyVersionPurpose
thiserror2.0Error type derivation
reqwest0.13HTTP/HTTPS model loading (rustls)
tokio1.49Async runtime for remote model loading
criterion0.8Benchmarking (dev dependency)
tempfile3.25Temporary files for tests (dev dependency)

litsea-cli (CLI Binary)

The CLI provides a command-line interface to Litsea’s functionality.

DependencyVersionPurpose
clap4.5Command-line argument parsing
ctrlc3.5Graceful Ctrl+C handling during training
tokio1.49Async runtime
litsea0.4Core library (workspace member)

Workspace Configuration

The workspace uses Cargo resolver version 3 (Rust Edition 2024):

[workspace]
resolver = "3"
members = ["litsea", "litsea-cli"]

[workspace.package]
version = "0.4.0"
edition = "2024"
rust-version = "1.87"

Shared dependencies are defined at the workspace level in [workspace.dependencies] and referenced by each crate with { workspace = true }.