Bzip2
bzip2 is a data compression algorithm and compressed file format. It was developed by Julian Seward.
Contents |
Identification
A bzip2 file starts with the byte pattern 42 5a 68 ?? 31 41 59 26 53 59
.
The first three bytes are ASCII "BZh
". (For signature "BZ0
", refer to the original bzip format.) The "h
" has been said to stand for "Huffman coding", but confirmation is needed.
The byte at offset 3 is a code for the block size. Its possible values range from 0x31
to 0x39
(ASCII "1
" to "9
").
The bytes at offset 4-9 are derived from the digits of the mathematical constant π (BCD-encoded).
The end-of-file marker uses magic number (hex) 17 72 45 38 50 90
, derived from the square root of π. However, it is not byte-aligned. The result is that one of the following byte sequences appears beginning 10 bytes from the end of the file:
b9 22 9c 28 48 dc 91 4e 14 24 ee 48 a7 0a 12 77 24 53 85 09 bb 92 29 c2 84 5d c9 14 e1 42 2e e4 8a 70 a1 17 72 45 38 50
Specifications
Software
Sample files
See also
- Burrows–Wheeler transform
- bzip (predecessor)
Links
- Wikipedia article
- bzip2 and libbzip2 website
- Chart of format details
- bzip.org changes hands (LWN article from August 9, 2018)
- ForensicsWiki entry (also includes more details on the headers)
- bzip.org