Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

CLI Examples

Basic Extraction

Extract text from a Wikipedia dump into the default text/ directory:

wicket simplewiki-latest-pages-articles.xml.bz2

Custom Output Directory

wicket dump.xml.bz2 -o output/

Write to stdout

Pipe output directly to another command:

wicket dump.xml.bz2 -o - -q | wc -l

JSON Output with Compression

wicket dump.xml.bz2 -o output/ --json -c

Extract Talk Pages

Extract namespace 1 (talk pages) with 8 workers:

wicket dump.xml.bz2 -o output/ --namespaces 1 --processes 8

Multiple Namespaces

Extract main articles and user pages:

wicket dump.xml.bz2 -o output/ --namespaces 0,2

Small Output Files

Split output into 500 KB files:

wicket dump.xml.bz2 -o output/ -b 500K

One Article per File

wicket dump.xml.bz2 -o output/ -b 0

Output Directory Structure

After extraction, the output directory looks like:

output/
  AA/
    wiki_00
    wiki_01
    ...
    wiki_99
  AB/
    wiki_00
    ...

With --compress:

output/
  AA/
    wiki_00.bz2
    wiki_01.bz2
    ...