BARF

From Just Solve the File Format Problem
Revision as of 19:49, 6 June 2017 by Jsummers (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
File Format
Name BARF
Ontology
Extension(s) .x, .x??
Released 2003

BARF (Better Archiver with Recursive Functionality) is a sort of reductio ad absurdum of compression techniques, able to be run repeatedly on a file and always achieve greater compression, until you get a file of zero bytes in all cases. How do you decompress those files? Well, BARF kind of cheats.

Its algorithm consists of picking which one of 257 algorithms best compresses the file. The first algorithm is one which assigns each of the 14 files in the Calgary corpus (one benchmark test of compression algorithms) a single-byte compressed value from 0-13, and uses a simple LZ77 code with header byte 255 for all other files. (This can be expandable to a corpus of up to 255 files by using byte values 14-254 as well.) Thus you get maximum compression for the targeted test suite, some compression for much other data, and (as with all single compression algorithms), some files that actually expand in size on "compression" due to the pigeonhole principle showing that there just aren't a sufficient number of shorter bit sequences to encode all of a longer one.

The other 256 "algorithms" (#2-257) simply consist of removing the first byte from the file if it is equal to the algorithm number minus 2, and leaving the file alone otherwise. (Well, actually, nitpickers would note that this action would produce a file indistinguishable from one where a byte was removed, and hence not a fully reversable step, so some scheme of adding flag bytes might have to be done... this is irrelevant, though, since none of the algorithms other than the one that removes a byte would ever be selected due to the meta-algorithm of picking only the smallest size.) This is, as you can see, guaranteed to produce a file less than or equal to the size of the original, and one of them will always reduce the file size by one byte.

By picking the best compression of these algorithms at each stage, you can thus recursively compress any file to zero bytes. Exactly which algorithms are needed to reverse the compression are denoted by the file extensions, which are recursively appended to produce a potentially very long filename. The first algorithm above is noted by the extension .x, while the others use .x followed by a two-digit base-26 encoding of n-2 (where n is the algorithm number), using digits 0-9 for the first digit and letters a-z for the second.

Thus, it cheats in two ways, by moving the bytes of arbitrary files into the filename (thus not actually achieving compression in most cases if the filenames need to be stored along with the data), and by using a specially-tailored high compression for the specific files of the benchmark test the author intended to win with this algorithm.

A later program, BARF2 (linked on the same page as BARF, below), achieves compression down to zero bytes without expanding the filename, but this requires storing the number of times the compression algorithm was applied recursively, which ends up encoding the entire file as a huge number.

BARF was created by the author of PAQ, Matt Mahoney.

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox