Huffman coding
From Just Solve the File Format Problem
Huffman coding is a kind of lossless data compression.
Symbols (such as bytes) are encoded as variable-length bit strings, optimized such that the more common symbols use shorter bit strings. No bit string is allowed to be a prefix of another bit string, which makes it possible for the decoder to deduce when a string has ended.
Discussion
A format may use Huffman coding in a number of different ways, including:
- The codebook is created automatically as the data is processed (Adaptive Huffman coding).
- The codebook is stored in the file, to be read before the encoded data. Sometimes the compressed data is split into segments, which each segment having a different codebook.
- The codebook is predefined, and known to both the encoder and decoder. Note that there is no fundamental distinction between a predefined Huffman code, and a variable-length integer code that wouldn't necessarily be called "Huffman".
Unfortunately, the widely-used terms "static Huffman" and "dynamic Huffman" are at least somewhat ambiguous. They don't have the same meanings to all authors.