Retraining Models

You can improve an existing model by resuming training with new data.

Command

litsea train -t 0.005 -i 1000 -m <EXISTING_MODEL> <NEW_FEATURES_FILE> <OUTPUT_MODEL>

Example

# Extract features from new corpus
litsea extract -l japanese ./new_corpus.txt ./new_features.txt

# Retrain from existing model
litsea train -t 0.005 -i 1000 \
    -m ./models/japanese.model \
    ./new_features.txt \
    ./models/japanese_v2.model

How It Works

flowchart LR
    A["Existing model<br/>(weights)"] --> C["Trainer"]
    B["New features"] --> C
    C --> D["Retrained model<br/>(updated weights)"]

The trainer initializes features and instances from the new features file
It loads the existing model weights via -m
Training continues with the loaded weights as a starting point
The new model inherits all learned patterns and refines them with new data

Use Cases

Domain adaptation – Fine-tune a general model on domain-specific text (e.g., medical, legal)
Incremental improvement – Add more training data without retraining from scratch
Error correction – Train on examples where the current model makes mistakes

Notes

The output model can be the same path as the input model (overwrites)
The -m flag accepts file paths, file://, http://, and https:// URIs
Retraining starts from the existing weights, so fewer iterations may be needed