BWTC32Key

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
Line 8: Line 8:
 
| magic = 0xFEFF4D00  
 
| magic = 0xFEFF4D00  
 
| compression = Always Lossless
 
| compression = Always Lossless
| extended from = Bzip
+
| extended from = bzip
 
| spec = https://github.com/sentogiga/bwtc32key  
 
| spec = https://github.com/sentogiga/bwtc32key  
 
| endianness = Big-endian  
 
| endianness = Big-endian  

Revision as of 00:44, 28 January 2020

File Format
Name BWTC32Key
Ontology
Extension(s) .b3k
Compression Always Lossless
Extended From bzip
Magic Bytes 0xFEFF4D00
Spec https://github.com/sentogiga/bwtc32key
Endianness Big-endian
TPM encryption
Released 2019

BWTC32Key is a single-file compression tool and format with optional encryption, that also is text-armored.

Discussion

The code is based upon specific JavaScript implementations of Base32768 AES, SHA-256, and a spiritual successor to the original bzip format. The code is based upon JS code that runs in pure JS with no dependencies and is housed in the HTML frontend as a single monolithic program.

The output of the encoder is a text string. B3K files are always UTF-16 Big-endian text documents bearing the Byte Order Mark that contain said string. That string is a version of Base32768 which uses Hangul Syllable blocks and Han ideographs to allow font support while keeping size down in bytes. Also, the string is essentially a Korean message but in a different style. The file starts with a header of 0xFEFF4D00 and ends with a trailer of 0x4D01. The file CAN be concatenated, but to reverse that, one must use a text editor to extract the portion you need, due to the way the original program currently works.

All of the code is stream and chunk compatible, and this includes the AES256 implementation which uses the Counter mode. The password field only accepts 8 bit ASCII to minimize headaches, but due to there being no length limit, UTF-7 or Punycode can be used to allow non-Latin passwords. Also, the encryption can be blanked, allowing the format to be used in things that encryption wouldn't be useful in, such as an image compression format.

The format was written in pure JS and is purely FOSS. The format was written by the author starting at age 16 and was definitely finished by the time they turned 17. This does show in the code. The compression and encryption functionality of this program coincidentally harks back to the Classic Mac OS days of PackIt, which featured similar sequential concatenation and compression of multiple files and forks into the archive as well as encryption, all far more primitive and inefficient than BWTC32Key.

The Base32768 final step is essentially the antithesis of the original BinHex, because instead of using an algorithm that doubles the binary input size via base 16, the base32768 step makes the AES256-CTR encrypted BWTC archive only 16/15 of the original size assuming the UTF-16BE with BOM output is the encoding to be fed into the output text file that uses the .B3K extension instead of the .txt extension used for normal plain text documents. It should be noted that since the BWTC compressor is very simple compared to even the original bzip, and that the 256bit AES variety used is the counter mode which needs zero padding, the format is very slim and subtle.

As a text based format that closely resembles human text, it can be used where text is required, and due to it being stream compatible, broadcasting it can be done as a means of sending data through live channels as a stream of data one could opt into. Another feature it has is that it will never decode corrupt input, without computing anything. Meaning, it will fail if the magic number isn't present in the compressed data due to corruption or the wrong key, or if the base32768 has junk thrown in or isn't properly formed or decodable. And if the corruption corrupts the text data itself, it also fails. This ensures that corrupted files will not be created by the decoder, which can help stop damage to your system if something like a firmware blob was affected. This format does not care about file information. Hence why it can be used as a chunk or stream format in cases where file info isn't needed.

Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox