BARF

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
Line 15: Line 15:
  
 
Thus, it cheats in two ways, by moving the bytes of arbitrary files into the filename (thus not actually achieving compression in most cases if the filenames need to be stored along with the data), and by using a specially-tailored high compression for the specific files of the benchmark test the author intended to win with this algorithm.
 
Thus, it cheats in two ways, by moving the bytes of arbitrary files into the filename (thus not actually achieving compression in most cases if the filenames need to be stored along with the data), and by using a specially-tailored high compression for the specific files of the benchmark test the author intended to win with this algorithm.
 +
 +
A later program, BARF2 (linked on the same page as BARF, below), achieves compression down to zero bytes without expanding the filename, but this requires storing the number of times the compression algorithm was applied recursively, which ends up encoding the entire file as a huge number.
  
 
BARF was created by the author of [[PAQ]].
 
BARF was created by the author of [[PAQ]].

Revision as of 14:18, 20 December 2012

File Format
Name BARF
Ontology
Extension(s) .x, .x??
Released 2003

BARF (Better Archiver with Recursive Functionality) is a sort of reductio ad absurdum of compression techniques, able to be run repeatedly on a file and always achieve greater compression, until you get a file of zero bytes in all cases. How do you decompress those files? Well, BARF kind of cheats.

Its algorithm consists of picking which one of 257 algorithms best compresses the file. The first algorithm is one which assigns each of the 14 files in the Calgary corpus (one benchmark test of compression algorithms) a single-byte compressed value from 0-13, and uses a simple LZ77 code with header byte 255 for all other files. (This can be expandable to a corpus of up to 255 files by using bytes 14-254 as well.) Thus you get maximum compression for the targeted test suite, some compression for much other data, and (as with all single compression algorithms), some files that actually expand in size on "compression" due to the pigeonhole principle showing that there just aren't a sufficient number of shorter bit sequences to encode all of a longer one.

The other 256 "algorithms" (#2-257) simply consist of removing the first byte from the file if it is equal to the algorithm number minus 2, and leaving the file alone otherwise. This is, as you can see, guaranteed to produce a file less than or equal to the size of the original, and one of them will always reduce the file size by one byte.

By picking the best compression of these algorithms at each stage, you can thus recursively compress any file to zero bytes. Exactly which algorithms are needed to reverse the compression are denoted by the file extensions, which are recursively appended to produce a potentially very long filename. The first algorithm above is noted by the extension .x, while the others use .x followed by a two-digit base-26 encoding of n-2 (where n is the algorithm number), using digits 0-9 for the first digit and letters a-z for the second.

Thus, it cheats in two ways, by moving the bytes of arbitrary files into the filename (thus not actually achieving compression in most cases if the filenames need to be stored along with the data), and by using a specially-tailored high compression for the specific files of the benchmark test the author intended to win with this algorithm.

A later program, BARF2 (linked on the same page as BARF, below), achieves compression down to zero bytes without expanding the filename, but this requires storing the number of times the compression algorithm was applied recursively, which ends up encoding the entire file as a huge number.

BARF was created by the author of PAQ.

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox