Sinclair BASIC tokenized file

Sinclair BASIC is a dialect of the BASIC programming language created by Nine Tiles Networks Ltd and used in the 8-bit home computers from Sinclair Research and Timex Sinclair.

The original 4KB version was developed for the Sinclair ZX80, followed by an 8KB version for the ZX81 and 16KB version for ZX Spectrum.

Some unusual features of the Sinclair BASIC:
 * There were keys on the keyboard for each BASIC keyword. For example, pressing P caused the entire command PRINT to appear. Some commands needed multiple keypresses to enter, For example, BEEP was keyed by pressing CAPS SHIFT plus SYMBOL SHIFT to access extended mode, keeping SYMBOL SHIFT held down and pressing Z.
 * When programs were SAVEd, the file written to disk or tape contained all of BASIC's internal state information, including the values of any defined basic variables, as well as the BASIC tokens.

BASIC File Layout
On a ZX81, a saved BASIC file is a snapshot of the computer memory from memory location 16393 through to the end of the variable table. There is no header.

BASIC lines
Each BASIC line is stored as:
 * 2 byte line number (in big-endian format)
 * 2 byte length of text including NEWLINE (in little endian format, length "excludes" the line number and length, i.e. to skip between lines you add "length of text" +4 bytes.
 * text (BASIC tokens)
 * NEWLINE (0x76 on ZX80/81, 0x0D on Spectrum)

When a numeric constant is included in the text of a BASIC line, an ASCII string displaying the constant value will be inserted, followed by one of two tokens (indicating floating-point or integral) and then the number in 5-byte numeric format.

BASIC Variables Table
Following the last BASIC line comes the variables table. Each entry in this table is of varying length and format.

The first byte in each entry is the variable name, of which the upper 3 bits indicate the variable type.

Most types of variables can only have a one-character name: A to Z for numerics, A$ to Z$ for strings. Numeric variables can have a multi-character names beginning with A-Z and continuing with A-Z or 0-9, e.g.  or. Names are case-insensitive and whitespace insensitive ( is the same variable as  ). Numeric variables and FOR-NEXT control variables share the same namespace, but no other types do. Numeric variable A, string variable A$, numeric array A(10) and string array A$(10) can all coexist under the name "A"

5-byte numeric format
Numbers have one of two formats: integers between -65535 and +65535 (inclusive) are in an "integral" format, while all other numbers are in "floating point" format. The value 0 can't be represented by the floating point format, so is always stored as an integral. The token 0x7E before the number indicates floating-point, and 0x0E indicates an integral number.

With "integral" format:
 * 1 byte: always 0
 * 1 byte: 0 if the number is positive or -1 (0xFF) if the number is negative
 * 2 bytes: little-endian unsigned integer from 0 to 65535. Subtract 65536 from this if number is flagged as negative
 * 1 byte: always 0

With "floating point" format:
 * 1 byte: exponent + 128 (0 &rarr; e=-128, 255 &rarr; e=127)
 * 4 bytes: big-endian mantissa

The number has to be normalised so that its most significant mantissa bit is always 1. This bit is then assumed to be 1 and is overwritten with a sign bit: cleared to 0 for positive numbers and set to 1 for negative numbers

ZX81 BASIC Tokens
{|

ZX Spectrum 48/128 BASIC Tokens
{| [1]: This token is only available in 128 BASIC.

Links

 * ZX81 BASIC Programming
 * Sinclair ZX81 review (2012)
 * Sinclair ZX81 BASIC Programming Manual