The litsea CLI provides commands for word segmentation, model training, and text processing.
litsea <COMMAND> [OPTIONS] [ARGS]
| Command | Description |
extract | Extract features from a corpus for training |
train | Train a word segmentation model |
segment | Segment text into words using a trained model |
| Option | Description |
-h, --help | Show help information |
-V, --version | Show version number |
flowchart LR
A["1. scripts/download_udtreebank.sh"] --> B["2. scripts/corpus_udtreebank.sh"]
B --> C["3. litsea extract"]
C --> D["4. litsea train"]
D --> E["5. litsea segment"]
- Download a UD Treebank:
conllu_file=$(bash scripts/download_udtreebank.sh -l ja -o /tmp)
- Convert to corpus format:
bash scripts/corpus_udtreebank.sh "$conllu_file" corpus.txt
- Extract features:
litsea extract -l japanese corpus.txt features.txt
- Train a model:
litsea train -t 0.005 -i 1000 features.txt model.model
- Segment text:
echo "text" | litsea segment -l japanese model.model
flowchart LR
A["1. scripts/download_udtreebank.sh"] --> B["2. scripts/corpus_udtreebank.sh -p"]
B --> C["3. litsea extract --pos"]
C --> D["4. litsea train --pos"]
D --> E["5. litsea segment --pos"]
- Download a UD Treebank:
conllu_file=$(bash scripts/download_udtreebank.sh -l ja -o /tmp)
- Convert to POS corpus format:
bash scripts/corpus_udtreebank.sh -p "$conllu_file" pos_corpus.txt
- Extract POS features:
litsea extract --pos -l japanese pos_corpus.txt features_pos.txt
- Train a POS model:
litsea train --pos --num-epochs 10 features_pos.txt model_pos.model
- Segment with POS tags:
echo "text" | litsea segment --pos -l japanese model_pos.model