Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

segment

Segment text into words using a trained model.

Usage

echo "text" | litsea segment [OPTIONS] <MODEL_URI>

Arguments

ArgumentDescription
MODEL_URIPath or URL to the trained model file. Supports: local file paths, file://, http://, https://

Options

OptionDefaultDescription
-l, --language <LANGUAGE>japaneseLanguage for character type classification. Accepts: japanese / ja, chinese / zh, korean / ko
--posoffEnable POS-tagged segmentation output. Requires a POS model trained with train --pos

Input / Output

  • Input: Reads from stdin, one sentence per line. Empty lines are skipped.
  • Output: Writes to stdout, space-separated tokens, one line per input line.

Examples

Japanese:

echo "LitseaはTinySegmenterを参考に開発された。" \
  | litsea segment -l japanese ./models/japanese.model
Litsea は TinySegmenter を 参考 に 開発 さ れ た 。

Chinese:

echo "中文分词测试。" | litsea segment -l chinese ./models/chinese.model

Korean:

echo "한국어 단어 분할 테스트입니다." \
  | litsea segment -l korean ./models/korean.model

Processing a file:

cat input.txt | litsea segment -l japanese ./models/japanese.model > output.txt

Loading a model from a URL:

echo "テスト文です。" \
  | litsea segment -l japanese https://example.com/models/japanese.model

POS-Tagged Segmentation (--pos)

When the --pos flag is specified, segmentation and POS tagging are performed simultaneously using an Averaged Perceptron model.

Usage

echo "text" | litsea segment --pos [OPTIONS] <MODEL_URI>

Output Format

Each token is output in word/POS format. POS tags conform to the UPOS tag set.

echo "今日はいい天気ですね。" \
  | litsea segment --pos -l japanese ./models/japanese_pos.model
今日/X は/ADP いい/ADJ 天気/NOUN です/AUX ね/PART 。/PUNCT

Processing a File

cat input.txt | litsea segment --pos -l japanese ./models/japanese_pos.model > output.txt

Notes

  • The --language flag must match the language the model was trained for
  • Model loading is asynchronous and supports HTTP/HTTPS with TLS (rustls)
  • The model URI is not restricted to file paths – any valid URL is accepted
  • When using --pos, the model must be a POS model trained with train --pos