Averaged Perceptron
Litsea uses the Averaged Perceptron algorithm for multiclass classification to perform joint word segmentation and POS tagging. This chapter explains the algorithm as implemented in Litsea.
Overview
While AdaBoost performs binary classification (boundary vs. non-boundary), the Averaged Perceptron performs multiclass classification – predicting one of 18 segment labels for each character position:
- 17 boundary labels:
B-ADJ,B-ADP,B-ADV,B-AUX,B-CCONJ,B-DET,B-INTJ,B-NOUN,B-NUM,B-PART,B-PRON,B-PROPN,B-PUNCT,B-SCONJ,B-SYM,B-VERB,B-X - 1 non-boundary label:
O(continuation of the current word)
These labels correspond to the 17 Universal POS (UPOS) tags from the Universal Dependencies project, prefixed with B- to indicate a word boundary. This enables simultaneous word boundary detection and POS estimation in a single classification step.
Algorithm
Weight Representation
The perceptron maintains a weight vector per class. Weights are stored as a sparse map:
weights: HashMap<Feature, HashMap<Class, f64>>
For example:
weights["UW4:猫"]["B-NOUN"] = 2.5
weights["UC4:H"]["B-NOUN"] = 1.8
weights["UW4:猫"]["O"] = -0.3
...
For a given feature set, the score for each class is the sum of its feature weights:
score(class) = sum(weights[feature][class] for each feature in input)
prediction = argmax(score(class) for all classes)
Update Rule
When the perceptron makes a misclassification:
For each training instance (features, truth):
guess = predict(features)
if guess != truth:
For each feature f in features:
weights[f][truth] += 1.0 # increase weight for correct class
weights[f][guess] -= 1.0 # decrease weight for predicted class
This increases the weights for the correct class and decreases them for the incorrectly predicted class, making the correct prediction more likely for similar inputs in the future.
Averaging
A key improvement over the basic perceptron is weight averaging. Rather than using the final weights (which can be unstable and tend to overfit to the tail of the training data), the model averages all weight vectors seen during training. This improves generalization to unseen data.
The implementation uses a cumulative sum approach for efficiency:
cumulative[feature][class] += weights[feature][class] * elapsed_steps
At the end of training:
averaged[feature][class] = cumulative[feature][class] / total_steps
This avoids storing all intermediate weight vectors while producing the same result. The averaging reduces dependence on the order of training data and improves generalization performance.
Training with Epochs
Training iterates over the data multiple times (epochs). Each epoch processes all training instances in order:
For each epoch (1 to num_epochs):
For each instance in training data:
features = extract_features(instance)
predicted = argmax(score(class) for all classes)
if predicted != correct_label:
update weights
accumulate weights for averaging
Training supports graceful interruption via AtomicBool – a Ctrl+C signal stops training and saves the model at its current state.
#![allow(unused)]
fn main() {
use std::sync::Arc;
use std::sync::atomic::AtomicBool;
use litsea::perceptron::AveragedPerceptron;
let mut perceptron = AveragedPerceptron::new();
// ... add instances ...
let running = Arc::new(AtomicBool::new(true));
perceptron.train(10, running); // 10 epochs
}
Model File Format
The Averaged Perceptron model is saved as a text file with the following structure:
18
O
B-ADJ
B-ADP
...
B-X
feature1\tclass1\tweight1
feature2\tclass2\tweight2
...
- Line 1: Number of classes (18)
- Lines 2 to N+1: Class names, one per line
- Remaining lines: Feature weights, tab-separated as
feature\tclass\tweight - Zero-weight entries are omitted
Comparison with AdaBoost
| Aspect | AdaBoost | Averaged Perceptron |
|---|---|---|
| Classification | Binary (+1/-1) | Multiclass (18 classes) |
| Output | Word boundaries only | Word boundaries + POS tags |
| Weak learner | Decision stumps per feature | None (linear classifier) |
| Weight management | One weight per feature | Class x feature weight matrix |
| Generalization | Ensemble | Weight averaging |
| Training | Iterative boosting with sample reweighting | Online learning with weight averaging |
| Model size | A few KB | ~11 MB (with POS features) |
| Hyperparameters | threshold, num_iterations | num_epochs |
Hyperparameters
| Parameter | Default | Description |
|---|---|---|
num_epochs | 10 | Number of training passes over the data. More epochs can improve accuracy but may overfit |