Huffman coding
From Just Solve the File Format Problem
(Difference between revisions)
(→See also) |
|||
Line 19: | Line 19: | ||
* [[Adaptive Huffman coding]] | * [[Adaptive Huffman coding]] | ||
* [[Canonical Huffman code]] | * [[Canonical Huffman code]] | ||
+ | * [[LZ77 with Huffman coding]] | ||
* [[Modified Huffman]] | * [[Modified Huffman]] | ||
== External links == | == External links == | ||
* [[Wikipedia:Huffman coding|Wikipedia article]] | * [[Wikipedia:Huffman coding|Wikipedia article]] |
Revision as of 11:42, 12 September 2020
Huffman coding is a kind of lossless data compression.
Symbols (such as bytes) are encoded as variable-length bit strings, optimized such that the more common symbols use shorter bit strings. No bit string is allowed to be a prefix of another bit string, which makes it possible for the decoder to deduce when a string has ended.
Discussion
A format may use Huffman coding in a number of different ways, including:
- The codebook is created automatically as the data is processed (Adaptive Huffman coding).
- The codebook is stored in the file, to be read before the encoded data. Sometimes the compressed data is split into segments, which each segment having a different codebook.
- The codebook is predefined, and known to both the encoder and decoder. Note that there is no fundamental distinction between a predefined Huffman code, and a variable-length integer code that wouldn't necessarily be called "Huffman".
Unfortunately, the widely-used terms "static Huffman" and "dynamic Huffman" are at least somewhat ambiguous. They don't have the same meanings to all authors.