BARF

From Just Solve the File Format Problem
Revision as of 13:49, 20 December 2012 by Dan Tobias (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
File Format
Name BARF
Ontology
Extension(s) .x, .x??

BARF (Better Archiver with Recursive Functionality) is a sort of reductio ad absurdum of compression techniques, able to be run repeatedly on a file and always achieve greater compression, until you get a file of zero bytes in all cases. How do you decompress those files? Well, BARF kind of cheats.

Its algorithm consists of picking which one of 257 algorithms best compresses the file. The first algorithm is one which assigns each of the 14 files in the Calgary corpus (one benchmark test of compression algorithms) a single-byte compressed value from 0-13, and uses a simple LZ77 code with header byte 255 for all other files. (This can be expandable to a corpus of up to 255 files by using bytes 14-254 as well.) Thus you get maximum compression for the targeted test suite, some compression for much other data, and (as with all single compression algorithms), some files that actually expand in size on "compression" due to the pigeonhole principle showing that there just aren't a sufficient number of shorter bit sequences to encode all of a longer one.

The other 256 "algorithms" (#2-257) simply consist of removing the first byte from the file if it is equal to the algorithm number minus 2, and leaving the file alone otherwise. This is, as you can see, guaranteed to produce a file less than or equal to the size of the original, and one of them will always reduce the file size by one byte.

By picking the best compression of these algorithms at each stage, you can thus recursively compress any file to zero bytes. Exactly which algorithms are needed to reverse the compression are denoted by the file extensions, which are recursively appended to produce a potentially very long filename. The first algorithm above is noted by the extension .x, while the others use .x followed by a two-digit base-26 encoding of n-2 (where n is the algorithm number), using digits 0-9 for the first digit and letters a-z for the second.

Thus, it cheats in two ways, by moving the bytes of arbitrary files into the filename (thus not actually achieving compression in most cases if the filenames need to be stored along with the data), and by using a specially-tailored high compression for the specific files of the benchmark test the author intended to win with this algorithm.

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox