Huffman coding
(3 intermediate revisions by one user not shown) | |||
Line 2: | Line 2: | ||
|formattype=electronic | |formattype=electronic | ||
|subcat=Compression | |subcat=Compression | ||
+ | |pronom={{PRONOM|x-cmp/19}} | ||
+ | |wikidata={{wikidata|Q2647}} | ||
}} | }} | ||
'''Huffman coding''' is a kind of lossless data compression. | '''Huffman coding''' is a kind of lossless data compression. | ||
Line 7: | Line 9: | ||
Symbols (such as bytes) are encoded as variable-length bit strings, optimized such that the more common symbols use shorter bit strings. No bit string is allowed to be a prefix of another bit string, which makes it possible for the decoder to deduce when a string has ended. | Symbols (such as bytes) are encoded as variable-length bit strings, optimized such that the more common symbols use shorter bit strings. No bit string is allowed to be a prefix of another bit string, which makes it possible for the decoder to deduce when a string has ended. | ||
− | + | == Discussion == | |
+ | A format may use Huffman coding in a number of different ways, including: | ||
+ | * The codebook is created automatically as the data is processed ([[Adaptive Huffman coding]]). | ||
+ | * The codebook is stored in the file, to be read before the encoded data. Sometimes the compressed data is split into segments, which each segment having a different codebook. | ||
+ | * The codebook is predefined, and known to both the encoder and decoder. Note that there is no fundamental distinction between a predefined Huffman code, and a variable-length integer code that wouldn't necessarily be called "Huffman". | ||
+ | |||
+ | Unfortunately, the widely-used terms "static Huffman" and "dynamic Huffman" are at least somewhat ambiguous. They don't have the same meanings to all authors. | ||
+ | |||
+ | === Terminology === | ||
+ | Technically, ''Huffman coding'' is a specific algorithm that finds a bit-oriented ''prefix code''. The code it finds would then be called a '''Huffman code'''. | ||
+ | |||
+ | For formats that use [[adaptive Huffman coding]], the decompressor must use the same algorithm as the compressor. | ||
+ | |||
+ | But in most other cases, the algorithm used by the compressor is irrelevant to the decompressor. So the compressor does not have to use the Huffman coding algorithm at all, even if the format specification calls for it. It could use something similar like [[Shannon–Fano coding]], or something completely different. Everything will still work. Much format documentation, including this wiki, still uses the term ''Huffman coding'' in such cases, though it's a slight abuse of terminology. | ||
== See also == | == See also == | ||
+ | * [[Adaptive Huffman coding]] | ||
+ | * [[Canonical Huffman code]] | ||
+ | * [[LZ77 with Huffman coding]] | ||
* [[Modified Huffman]] | * [[Modified Huffman]] | ||
+ | * [[Shannon–Fano coding]] | ||
== External links == | == External links == | ||
− | * [[Wikipedia:Huffman coding | + | * [[Wikipedia: Huffman coding]] |
+ | * [[Wikipedia: Prefix code]] |
Latest revision as of 14:36, 20 January 2021
Huffman coding is a kind of lossless data compression.
Symbols (such as bytes) are encoded as variable-length bit strings, optimized such that the more common symbols use shorter bit strings. No bit string is allowed to be a prefix of another bit string, which makes it possible for the decoder to deduce when a string has ended.
Contents |
[edit] Discussion
A format may use Huffman coding in a number of different ways, including:
- The codebook is created automatically as the data is processed (Adaptive Huffman coding).
- The codebook is stored in the file, to be read before the encoded data. Sometimes the compressed data is split into segments, which each segment having a different codebook.
- The codebook is predefined, and known to both the encoder and decoder. Note that there is no fundamental distinction between a predefined Huffman code, and a variable-length integer code that wouldn't necessarily be called "Huffman".
Unfortunately, the widely-used terms "static Huffman" and "dynamic Huffman" are at least somewhat ambiguous. They don't have the same meanings to all authors.
[edit] Terminology
Technically, Huffman coding is a specific algorithm that finds a bit-oriented prefix code. The code it finds would then be called a Huffman code.
For formats that use adaptive Huffman coding, the decompressor must use the same algorithm as the compressor.
But in most other cases, the algorithm used by the compressor is irrelevant to the decompressor. So the compressor does not have to use the Huffman coding algorithm at all, even if the format specification calls for it. It could use something similar like Shannon–Fano coding, or something completely different. Everything will still work. Much format documentation, including this wiki, still uses the term Huffman coding in such cases, though it's a slight abuse of terminology.
[edit] See also
- Adaptive Huffman coding
- Canonical Huffman code
- LZ77 with Huffman coding
- Modified Huffman
- Shannon–Fano coding