Tokenized BASIC

Tokenized BASIC is a method of storing programs in the BASIC programming language by encoding the various keywords of the language as "tokens" instead of as plain text. Since the tokens are shorter byte sequences than the full text of the keywords, such programs take up less storage space in memory and in external storage such as disks or tapes, which was a significant concern in an era when computers were much more limited in memory and disk space than they are at present. It can also take less processing time for the interpreters to parse the code when it is in the form of tokens, which is another important concern for slower computers (and is the reason some languages such as Python create bytecode files from the source code to help their interpreter even now). Since computers are much faster and have much more memory and disk space now, tokenized languages are rarely used for source code storage, though compilers may generate intermediate data that is tokenized in some way in the course of producing executable code from text-based sources.

In its heyday of the 1960s through 1980s, BASIC existed in many dialects, designed for specific machine platforms, and the format of tokenized programs was different in each. On systems where file types were commonly identified using extensions, .BAS was usually used for BASIC programs, while other systems had their own ways of identifying file types and often had a type code specific to their own platform's BASIC interpreter (or multiple codes for different versions of BASIC, such as Apple II DOS's 'I' for Integer BASIC and 'A' for Applesoft floating-point BASIC).

People intending to transfer BASIC programs cross-system would usually export them in text form by piping the output of the LIST command to a text file (which sometimes required special tweaking to get the proper format; for instance, on the Apple II, one needed to do a poke first: POKE 33,33, to set the screen window width narrow enough to defeat the automatic insertion of padding spaces on normal-size lines). Some BASICs made things easier by offering a "save-as-text" option in the SAVE command (sometimes appending ",A", with A for ASCII, worked). Cross-system porting usually required considerable program revision as well due to the great differences between different BASIC dialects.

Specific tokenized BASIC formats:


 * AMOS BASIC tokenized file
 * APF Imagination Machine BASIC tokenized file
 * Apple Integer BASIC tokenized file
 * Applesoft BASIC tokenized file
 * Atari BASIC tokenized file
 * BBC BASIC tokenized file
 * CCE MC-1000 BASIC tokenized file
 * Coleco ADAM SmartBASIC tokenized file
 * Commodore BASIC tokenized file
 * Compucolor BASIC tokenized file
 * Exidy Sorcerer BASIC tokenized file
 * GW-BASIC tokenized file (or BASICA) (IBM PC and compatibles)
 * Mattel Aquarius BASIC tokenized file
 * MBASIC tokenized file (Microsoft BASIC for CP/M)
 * NASCOM BASIC tokenized file
 * Ohio Scientific BASIC tokenized file
 * Sinclair BASIC tokenized file (for ZX80, ZX81 and Spectrum)
 * Sol BASIC tokenized file
 * Tandy 200 BASIC tokenized file
 * TI BASIC tokenized file (TI 99/4A)
 * Tiny BASIC tokenized file (ran on KIM-1 and some other early machines)
 * TRS-80 Color BASIC tokenized file
 * TRS-80 Level II BASIC tokenized file

As a bit of trivia, three of the companies referenced above are named after U.S. states: Texas Instruments (TI), Ohio Scientific, and Connecticut Leather Company (Coleco).

Links and references

 * Detokenizers for Microsoft BASICs
 * Microsoft 6502 BASIC archeology
 * MESS: emulates many of the platforms these BASICs run on
 * JSMESS: JavaScript version of MESS