Evaluating Models

Understanding model quality is essential for producing good segmentation results.

Metrics

The train command outputs three key metrics after training:

Accuracy

Accuracy = (TP + TN) / Total Instances

The percentage of all character positions that were correctly classified (both boundaries and non-boundaries). This is the broadest measure of model quality.

Precision

Precision = TP / (TP + FP)

Of the boundaries the model predicted, what fraction was correct. High precision means few false boundaries (over-segmentation).

Recall

Recall = TP / (TP + FN)

Of the actual boundaries, what fraction did the model find. High recall means few missed boundaries (under-segmentation).

Confusion Matrix

	Predicted Boundary (+1)	Predicted Non-boundary (-1)
Actual Boundary	True Positive (TP)	False Negative (FN)
Actual Non-boundary	False Positive (FP)	True Negative (TN)

Pre-trained Model Benchmarks

Model	Accuracy	Precision	Recall	Training Corpus
japanese.model	94.15%	95.57%	94.36%	UD Japanese-GSD
korean.model	85.08%	–	–	UD Korean-GSD
chinese.model	80.72%	–	–	UD Chinese-GSD

Improving Model Quality

If accuracy is unsatisfactory, consider:

More training data – A larger and more diverse corpus
Lower threshold – Try -t 0.001 to allow more boosting iterations
More iterations – Try -i 5000 or higher
Better corpus quality – Ensure consistent tokenization and clean text
Retraining – Start from an existing model and train with additional data (see Retraining Models)

Keyboard shortcuts

Litsea Documentation