Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Averaged Perceptron

Litsea uses the Averaged Perceptron algorithm for multiclass classification to perform joint word segmentation and POS tagging. This chapter explains the algorithm as implemented in Litsea.

Overview

While AdaBoost performs binary classification (boundary vs. non-boundary), the Averaged Perceptron performs multiclass classification – predicting one of 18 segment labels for each character position:

  • 17 boundary labels: B-ADJ, B-ADP, B-ADV, B-AUX, B-CCONJ, B-DET, B-INTJ, B-NOUN, B-NUM, B-PART, B-PRON, B-PROPN, B-PUNCT, B-SCONJ, B-SYM, B-VERB, B-X
  • 1 non-boundary label: O (continuation of the current word)

These labels correspond to the 17 Universal POS (UPOS) tags from the Universal Dependencies project, prefixed with B- to indicate a word boundary. This enables simultaneous word boundary detection and POS estimation in a single classification step.

Algorithm

Weight Representation

The perceptron maintains a weight vector per class. Weights are stored as a sparse map:

weights: HashMap<Feature, HashMap<Class, f64>>

For example:

weights["UW4:猫"]["B-NOUN"] = 2.5
weights["UC4:H"]["B-NOUN"]  = 1.8
weights["UW4:猫"]["O"]      = -0.3
...

For a given feature set, the score for each class is the sum of its feature weights:

score(class) = sum(weights[feature][class] for each feature in input)
prediction = argmax(score(class) for all classes)

Update Rule

When the perceptron makes a misclassification:

For each training instance (features, truth):
    guess = predict(features)

    if guess != truth:
        For each feature f in features:
            weights[f][truth] += 1.0   # increase weight for correct class
            weights[f][guess] -= 1.0   # decrease weight for predicted class

This increases the weights for the correct class and decreases them for the incorrectly predicted class, making the correct prediction more likely for similar inputs in the future.

Averaging

A key improvement over the basic perceptron is weight averaging. Rather than using the final weights (which can be unstable and tend to overfit to the tail of the training data), the model averages all weight vectors seen during training. This improves generalization to unseen data.

The implementation uses a cumulative sum approach for efficiency:

cumulative[feature][class] += weights[feature][class] * elapsed_steps

At the end of training:
    averaged[feature][class] = cumulative[feature][class] / total_steps

This avoids storing all intermediate weight vectors while producing the same result. The averaging reduces dependence on the order of training data and improves generalization performance.

Training with Epochs

Training iterates over the data multiple times (epochs). Each epoch processes all training instances in order:

For each epoch (1 to num_epochs):
    For each instance in training data:
        features = extract_features(instance)
        predicted = argmax(score(class) for all classes)
        if predicted != correct_label:
            update weights
        accumulate weights for averaging

Training supports graceful interruption via AtomicBool – a Ctrl+C signal stops training and saves the model at its current state.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use std::sync::atomic::AtomicBool;
use litsea::perceptron::AveragedPerceptron;

let mut perceptron = AveragedPerceptron::new();
// ... add instances ...
let running = Arc::new(AtomicBool::new(true));
perceptron.train(10, running);  // 10 epochs
}

Model File Format

The Averaged Perceptron model is saved as a text file with the following structure:

18
O
B-ADJ
B-ADP
...
B-X
feature1\tclass1\tweight1
feature2\tclass2\tweight2
...
  • Line 1: Number of classes (18)
  • Lines 2 to N+1: Class names, one per line
  • Remaining lines: Feature weights, tab-separated as feature\tclass\tweight
  • Zero-weight entries are omitted

Comparison with AdaBoost

AspectAdaBoostAveraged Perceptron
ClassificationBinary (+1/-1)Multiclass (18 classes)
OutputWord boundaries onlyWord boundaries + POS tags
Weak learnerDecision stumps per featureNone (linear classifier)
Weight managementOne weight per featureClass x feature weight matrix
GeneralizationEnsembleWeight averaging
TrainingIterative boosting with sample reweightingOnline learning with weight averaging
Model sizeA few KB~11 MB (with POS features)
Hyperparametersthreshold, num_iterationsnum_epochs

Hyperparameters

ParameterDefaultDescription
num_epochs10Number of training passes over the data. More epochs can improve accuracy but may overfit