Commodore BASIC tokenized file

Commodore BASIC tokenized files stored programs in the versions of the BASIC programming language used on Commodore computers, including the PET, VIC-20, Commodore 64, and Commodore 128. A number of versions were used, deriving from a version that was licensed perpetually from Microsoft by Commodore for a one-time fee, and further developed internally at Commodore. The most common version is 2.0, which was found on the Commodore 64, though earlier PET computers had BASIC 4.0 (Commodore put an out-of-date BASIC in the 64 because it was "just a home computer" not expected to be used for serious stuff). The Commodore 128 had BASIC 7.0, and the Commodore 16 (PAL) had BASIC 3.5.

Like most BASICs of its era, Commodore BASIC used a tokenized format to save its programs, rather than plain-text source code. Printable PETSCII characters (and the various control codes which could be used within literal strings to do things like change the color of text) generally stood for themselves, but other bytes had different meanings. The "high-bit" bytes from #128-#254 stood for the various BASIC commands and mathematical operators (#255 was used for the "pi" character). A null (#0) byte marked the end of a program line, and some header bytes were used at the start of the line to encode the line number and the byte offset to the next line (a 2-byte little-endian unsigned integer, with 0 indicating the last line of the program).

Only the characters up to #203 were actually assigned BASIC commands, leaving #204-#254 unassigned and available for future expansion; there may be third-party extended BASICs that use some of them.

Unlike some other BASICs of the time, the Commodore tokenizer didn't collapse extra whitespace on tokenization, or expand it on listing a program; all space characters entered by the programmer were stored in the file. This meant that you could save disk and memory space by eliminating all unnecessary spaces from the code, though this might make the code harder to read at places. Very few spaces were actually necessary; FORI=1TO10 worked the same as FOR I = 1 TO 10.

BASIC programs were stored by Commodore DOS as file type "PRG" (program), in which the first two bytes stored the memory location it was expected to be loaded into. This was only used when the file was loaded with the LOAD filename,8,1 command, where the final '1' told it to use the memory location in the file; LOAD filename,8 always loaded it into the normal BASIC program memory space. When these files are transferred to other platforms, they are often saved with .prg extensions, though this extension was not part of the original filename on the Commodore (the file-type is a separate field in Commodore directory structures).

Format documentation

 * Commodore BASIC tokens
 * Commodore token list

Software

 * CBM BASIC Lister (for Linux and Windows)
 * DirMaster: reads C64 disk images / archives / files in Windows
 * JSMESS in-browser emulations: C-64, C-128, PET-2001, PET-2001n, C-16 (PAL)
 * detox64 detokenizer

Sample files

 * https://telparia.com/fileFormatSamples/document/cbmBasic/

Other links and references

 * Wikipedia article: Commodore BASIC
 * Commodore BASIC as a scripting language for UNIX and Windows
 * Today's kids try the Commodore 64
 * 10 PRINT CHR$(205.5+RND(1)); : GOTO 10 (Software Studies)
 * Documented disassembled code from Commodore BASIC ROM
 * 1978 source to Microsoft 6502 BASIC (ancestral to Commodore BASIC; token list embedded is similar but not quite the same)
 * Why did the Commodore 64 use two spaces in its syntax error message?