Huffman coding
From Just Solve the File Format Problem
(Difference between revisions)
(PRONOM ID) |
|||
Line 8: | Line 8: | ||
Symbols (such as bytes) are encoded as variable-length bit strings, optimized such that the more common symbols use shorter bit strings. No bit string is allowed to be a prefix of another bit string, which makes it possible for the decoder to deduce when a string has ended. | Symbols (such as bytes) are encoded as variable-length bit strings, optimized such that the more common symbols use shorter bit strings. No bit string is allowed to be a prefix of another bit string, which makes it possible for the decoder to deduce when a string has ended. | ||
− | + | == Discussion == | |
+ | A format may use Huffman coding in a number of different ways, including: | ||
+ | * The codebook is created automatically as the data is processed ([[Adaptive Huffman coding]]). | ||
+ | * The codebook is stored in the file, to be read before the encoded data. Sometimes the compressed data is split into segments, which each segment having a different codebook. | ||
+ | * The codebook is predefined, and known to both the encoder and decoder. Note that there is no fundamental distinction between a predefined Huffman code, and a variable-length integer code that wouldn't necessarily be called "Huffman". | ||
+ | |||
+ | Unfortunately, the widely-used terms "static Huffman" and "dynamic Huffman" are at least somewhat ambiguous. They don't have the same meanings to all authors. | ||
== See also == | == See also == | ||
+ | * [[Adaptive Huffman coding]] | ||
+ | * [[Canonical Huffman code]] | ||
* [[Modified Huffman]] | * [[Modified Huffman]] | ||
== External links == | == External links == | ||
* [[Wikipedia:Huffman coding|Wikipedia article]] | * [[Wikipedia:Huffman coding|Wikipedia article]] |
Revision as of 19:42, 27 August 2020
Huffman coding is a kind of lossless data compression.
Symbols (such as bytes) are encoded as variable-length bit strings, optimized such that the more common symbols use shorter bit strings. No bit string is allowed to be a prefix of another bit string, which makes it possible for the decoder to deduce when a string has ended.
Discussion
A format may use Huffman coding in a number of different ways, including:
- The codebook is created automatically as the data is processed (Adaptive Huffman coding).
- The codebook is stored in the file, to be read before the encoded data. Sometimes the compressed data is split into segments, which each segment having a different codebook.
- The codebook is predefined, and known to both the encoder and decoder. Note that there is no fundamental distinction between a predefined Huffman code, and a variable-length integer code that wouldn't necessarily be called "Huffman".
Unfortunately, the widely-used terms "static Huffman" and "dynamic Huffman" are at least somewhat ambiguous. They don't have the same meanings to all authors.