Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Evaluating Models

Understanding model quality is essential for producing good segmentation results.

Metrics

The train command outputs three key metrics after training:

Accuracy

Accuracy = (TP + TN) / Total Instances

The percentage of all character positions that were correctly classified (both boundaries and non-boundaries). This is the broadest measure of model quality.

Precision

Precision = TP / (TP + FP)

Of the boundaries the model predicted, what fraction was correct. High precision means few false boundaries (over-segmentation).

Recall

Recall = TP / (TP + FN)

Of the actual boundaries, what fraction did the model find. High recall means few missed boundaries (under-segmentation).

Confusion Matrix

Predicted Boundary (+1)Predicted Non-boundary (-1)
Actual BoundaryTrue Positive (TP)False Negative (FN)
Actual Non-boundaryFalse Positive (FP)True Negative (TN)

Pre-trained Model Benchmarks

ModelAccuracyPrecisionRecallTraining Corpus
japanese.model94.15%95.57%94.36%UD Japanese-GSD
korean.model85.08%UD Korean-GSD
chinese.model80.72%UD Chinese-GSD

Improving Model Quality

If accuracy is unsatisfactory, consider:

  1. More training data – A larger and more diverse corpus
  2. Lower threshold – Try -t 0.001 to allow more boosting iterations
  3. More iterations – Try -i 5000 or higher
  4. Better corpus quality – Ensure consistent tokenization and clean text
  5. Retraining – Start from an existing model and train with additional data (see Retraining Models)