Sinclair BASIC tokenized file

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(BASIC Variables Table)
Line 106: Line 106:
  
 
=== BASIC Variables Table ===
 
=== BASIC Variables Table ===
Following the last BASIC line comes the VARIABLEs table. Each entry in this table is of varying length and format. The first byte in each entry is the variable name, of which the upper 3 bits indicate the variable type.
+
Following the last BASIC line comes the variables table. Each entry in this table is of varying length and format.
 +
 
 +
The first byte in each entry is the variable name, of which the upper 3 bits indicate the variable type.
 +
 
 +
Most types of variables can only have a one-character name: A to Z for numerics, A$ to Z$ for strings. Numeric variables can have a multi-character names beginning with A-Z and continuing with A-Z or 0-9, e.g. <tt>F0O</tt> or <tt>BAR</tt>. Names are case-insensitive and whitespace insensitive (<tt>hello world</tt> is the same variable as <tt>HeLlOwOrL d</tt>). Numeric variables and FOR-NEXT control variables share the same namespace, but no other types do. Numeric variable A, string variable A$, numeric array A(10) and string array A$(10) can all coexist under the name "A"<ref>http://www.worldofspectrum.org/ZXBasicManual/zxmanchap7.html</ref>
 +
 
 
{| class="wikitable"
 
{| class="wikitable"
!Variable Name!!Description!!Examples
+
!First byte!!Format!!Examples
 
|-
 
|-
 
|011nnnnn
 
|011nnnnn
|single letter numeric variable name, followed by 5 byte number.
+
|numeric variable
 +
* 1 byte: variable name (0x61 <tt>a</tt> to 0x7A <tt>z</tt>)
 +
* 5 bytes: numeric value
 +
|
 +
* <tt>61 00 00 01 00 00</tt> &rarr; <tt>LET a=1</tt>
 +
* <tt>69 00 FF FE FF 00</tt> &rarr; <tt>LET i=-65534</tt>
 +
* <tt>6A 00 00 FE FF 00</tt> &rarr; <tt>LET j=65534</tt>
 +
* <tt>62 93 3C 61 4E 00</tt> &rarr; <tt>LET b=12345678</tt> (12345678 = 0xBC614E)
 +
* <tt>63 93 BC 61 4E 00</tt> &rarr; <tt>LET c=-12345678</tt>
 +
* <tt>65 82 2D F8 4C AD</tt> &rarr; <tt>LET e=2.71828</tt>
 +
|-
 +
|101nnnnn
 +
|numeric variable with multi-character name
 +
* 1 byte: 0xA1 <tt>a</tt> to 0xBA <tt>z</tt>
 +
* n bytes: remainder of name (A-Z, a-z, 0-9 ASCII allowed, final character has high bit set)
 +
* 5 bytes: numeric value
 +
|
 +
* <tt>B0 69 E5 82 49 4A F0 41</tt> &rarr; <tt>LET pie=3.1452</tt> ("p"=0x10 ORed with 0xA0. "i" is regular ASCII (0x69). "e" is regular ASCII (0x65) ORed with 0x80 to indicate the end of the variable name)
 +
|-
 +
|010nnnnn
 +
|string variable
 +
* 1 byte: 0x41 <tt>a$</tt> to 0x5A <tt>z$</tt>
 +
* 2 bytes: little-endian string length (''n'')
 +
* ''n'' bytes: string value (ASCII)
 +
|
 +
* <tt>41 04 00 54 45 53 54</tt> &rarr; <tt>LET a$="TEST"</tt>
 +
|-
 +
|100nnnnn
 +
|numeric array
 +
* 1 byte: 0x81 <tt>a(n)</tt> to 0x9A <tt>z(n)</tt>
 +
* 2 bytes: little-endian size of data to follow in bytes, so you can easily skip to the next variable without computing full array size
 +
* 1 byte: number of dimensions in the array (1-255?)
 +
* 2 bytes: little-endian valid range of first dimension e.g. <tt>0A 00</tt> is valid range 1-10
 +
* ... 2 more bytes for every other dimension
 +
* ... array values (each a 5-byte number) in C-style order, iterating rightmost index first up to leftmost index, e.g. <tt>DIM a(2,3,4)</tt> will have values specified in this order: a(1,1,1), a(1,1,2), a(1,1,3), (1,1,4), a(1,2,1), a(1,2,2), a(1,2,3), a(1,2,4), a(1,3,1), a(1,3,2), a(1,3,3), a(1,3,4), a(2,1,1), ...
 +
|
 +
*<tt>81 1C 00 01 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00</tt> &rarr; <tt>DIM a(5)</tt>
 +
*<tt>81 19 00 02 02 00 02 00 00 00 01 00 00 00 00 02 00 00 00 00 03 00 00 00 00 04 00 00</tt> &rarr; <tt>DIM a(2,2); LET a(1,1)=1; LET a(1,2)=2; LET a(2,1)=3; LET a(2,2)=4</tt>
 +
|
 +
|-
 +
|110nnnnn
 +
|character array
 +
* 1 byte: 0xC1 <tt>a$(n)</tt> to 0xDA <tt>z$(n)</tt>
 +
|-
 +
|111nnnnnn
 +
|0xE1 <tt>a</tt> to 0xFA <tt>z</tt>
 +
|control variable of a FOR-NEXT loop. 18 bytes follow: three 5-byte numbers (current value, loop end limit, loop step increment), 2 bytes little-endian line number where the FOR loop was started (or 0xFEFF if started from the command line), then one more byte whose purpose is unknown
 +
|
 +
* <tt>E9 00 00 01 00 00 00 00 05 00 00 00 00 01 00 00 FF FE 02</tt> &rarr; <tt>FOR i=1 TO 5</tt> (before loop has run)
 +
* <tt>E9 00 00 06 00 00 00 00 05 00 00 00 00 01 00 00 FF FE 02</tt> &rarr; <tt>FOR i=1 TO 5</tt> (after loop has run)
 +
* <tt>E9 00 00 01 00 00 00 00 0A 00 00 00 00 02 00 00 FF FE 02</tt> &rarr; <tt>FOR i=1 TO 10 STEP 2</tt>
 +
* <tt>E9 81 0C CC CC CD 83 76 66 66 66 7D 4C CC CC CC FE FF 02</tt> &rarr; <tt>FOR i=1.1 TO 7.7 STEP .1</tt>
 +
|}
  
 
Numbers have one of two formats:<ref>http://www.worldofspectrum.org/ZXBasicManual/zxmanchap24.html</ref> integers between -65535 and +65535 (inclusive) are in an "integral" format, while all other numbers are in "floating point" format.
 
Numbers have one of two formats:<ref>http://www.worldofspectrum.org/ZXBasicManual/zxmanchap24.html</ref> integers between -65535 and +65535 (inclusive) are in an "integral" format, while all other numbers are in "floating point" format.
Line 126: Line 183:
 
* the first byte is the exponent plus 128, e.g. 0 means e=-128, 127 means e=-1, 128 means e=0, 129 means e=1, 255 means e=127
 
* the first byte is the exponent plus 128, e.g. 0 means e=-128, 127 means e=-1, 128 means e=0, 129 means e=1, 255 means e=127
 
* the remaining bytes are the mantissa in big-endian order. The MSB of the mantissa (which has been shifted so it's always 1) is replaced with a sign bit (positive numbers have MSB set to 0, negative numbers have MSB set to 1)
 
* the remaining bytes are the mantissa in big-endian order. The MSB of the mantissa (which has been shifted so it's always 1) is replaced with a sign bit (positive numbers have MSB set to 0, negative numbers have MSB set to 1)
|
 
* <tt>LET a=1</tt> becomes hex <tt>61 00 00 01 00 00</tt>
 
* <tt>LET i=-65534</tt> becomes hex <tt>69 00 FF FE FF 00</tt>
 
* <tt>LET j=65534</tt> becomes hex <tt>6A 00 00 FE FF 00</tt>
 
* <tt>LET b=12345678</tt> becomes hex <tt>62 93 3C 61 4E 00</tt> (12345678 = 0xBC614E = 0.101111000110000101001110b * 2^19)
 
* <tt>LET c=-12345678</tt> becomes hex <tt>63 93 BC 61 4E 00</tt>
 
* <tt>LET e=2.71828</tt> becomes hex <tt>65 82 2D F8 4C AD</tt>
 
|-
 
|101nnnnn
 
|multi letter numeric variable name (last letter has high bit set), followed by 5 byte number
 
|<tt>LET pie=3.1452</tt> becomes <tt>B0 69 E5 82 49 4A F0 41</tt>: "p" is the 16th character, 0x10, so ORed with 0xA0 becomes 0xB0. "i" is regular ASCII (0x69). "e" is regular ASCII (0x65) ORed with 0x80 to indicate the end of the variable name. The 5-byte number follows immediately.
 
|-
 
|100nnnnn || array of numbers (the array name is always a single letter)
 
|-
 
|111nnnnnn || control variable of a for-next loop (the variable name is always a single letter)
 
|-
 
|010nnnnn ||single letter string variable name - 0x20, 2 byte string length, text of string
 
|-
 
|110nnnnn || array of characters  (the array name is always a single letter)
 
|}
 
  
 
=== ZX81 BASIC Tokens ===
 
=== ZX81 BASIC Tokens ===

Revision as of 10:50, 9 May 2018

File Format
Name Sinclair BASIC tokenized file
Ontology

Sinclair BASIC is a dialect of the BASIC programming language created by Nine Tiles Networks Ltd and used in the 8-bit home computers from Sinclair Research and Timex Sinclair.

The original 4KB version was developed for the Sinclair ZX80, followed by an 8KB version for the ZX81 and 16KB version for ZX Spectrum.

Some unusual features of the Sinclair BASIC:

  • There were keys on the keyboard for each BASIC keyword. For example, pressing P caused the entire command PRINT to appear. Some commands needed multiple keypresses to enter, For example, BEEP was keyed by pressing CAPS SHIFT plus SYMBOL SHIFT to access extended mode, keeping SYMBOL SHIFT held down and pressing Z.
  • When programs were SAVEd, the file written to disk or tape contained all of BASIC's internal state information, including the values of any defined basic variables, as well as the BASIC tokens.

Contents

BASIC File Layout

On a ZX81, a saved BASIC file is a snapshot of the computer memory from memory location 16393 through to the end of the variable table. There is no header.

Address Name Description
16393 VERSN 0 Identifies ZX81 BASIC in saved programs.
16394 E_PPC Number of current line (with program cursor).
16396 D_FILE Pointer to the start of the 'Display file', i.e. what is being displayed on screen
16398 DF_CC Address of PRINT position in display file. Can be poked so that PRINT output is sent elsewhere.
16400 VARS Pointer to start of BASIC Variable table
16402 DEST Address of variable in assignment.
16404 E_LINE Pointer to line currently being entered
16406 CH_ADD Address of the next character to be interpreted: the character after the argument of PEEK, or the NEWLINE at the end of a POKE statement.
16408 X_PTR Address of the character preceding the marker.
16410 STKBOT pointer to start (bottom) of stack
16412 STKEND pointer to end (top) of stack
16414 BERG Calculator's b register.
16415 MEM Address of area used for calculator's memory. (Usually MEMBOT, but not always.)
16417 not used
16418 DF_SZ The number of lines (including one blank line) in the lower part of the screen.
16419 S_TOP The number of the top program line in automatic listings.
16421 LAST_K Shows which keys pressed.
16423 Debounce status of keyboard.
16424 MARGIN Number of blank lines above or below picture: 55 in Britain, 31 in America.
16425 NXTLIN Address of next program line to be executed.
16427 OLDPPC Line number of which CONT jumps.
16429 FLAGX Various flags.
16430 STRLEN Length of string type destination in assignment.
16432 T_ADDR Address of next item in syntax table (very unlikely to be useful).
16434 SEED The seed for RND. This is the variable that is set by RAND.
16436 FRAMES Counts the frames displayed on the television. Bit 15 is 1. Bits 0 to 14 are decremented for each frame set to the television. This can be used for timing, but PAUSE also uses it. PAUSE resets to 0 bit 15, & puts in bits 0 to 14 the length of the pause. When these have been counted down to zero, the pause stops. If the pause stops because of a key depression, bit 15 is set to 1 again.
16438 COORDS x-coordinate of last point PLOTted.
16439 y-coordinate of last point PLOTted.
16440 PR_CC Less significant byte of address of next position for LPRINT to print as (in PRBUFF).
16441 S_POSN Column number for PRINT position.
16442 Line number for PRINT position.
16443 CDFLAG Various flags. Bit 7 is on (1) during compute & display mode.
16444 PRBUFF Printer buffer (33rd character is NEWLINE).
16477 MEMBOT Calculator's memory area; used to store numbers that cannot conveniently be put on the calculator stack.
16507 not used
16509 First BASIC line.

BASIC lines

Each BASIC line is stored as:

  • 2 byte line number (in big-endian format)
  • 2 byte length of text including NEWLINE (in little endian format, length "excludes" the line number and length, i.e. to skip between lines you add "length of text" +4 bytes.
  • text (BASIC tokens)
  • NEWLINE (0x76)

When a numeric constant is included in the text of a BASIC line, an ASCII string displaying the constant value will be inserted, followed by the token 0x7E, and the next 5 bytes are the value of the constant in floating point format.

BASIC Variables Table

Following the last BASIC line comes the variables table. Each entry in this table is of varying length and format.

The first byte in each entry is the variable name, of which the upper 3 bits indicate the variable type.

Most types of variables can only have a one-character name: A to Z for numerics, A$ to Z$ for strings. Numeric variables can have a multi-character names beginning with A-Z and continuing with A-Z or 0-9, e.g. F0O or BAR. Names are case-insensitive and whitespace insensitive (hello world is the same variable as HeLlOwOrL d). Numeric variables and FOR-NEXT control variables share the same namespace, but no other types do. Numeric variable A, string variable A$, numeric array A(10) and string array A$(10) can all coexist under the name "A"[1]

First byte Format Examples
011nnnnn numeric variable
  • 1 byte: variable name (0x61 a to 0x7A z)
  • 5 bytes: numeric value
  • 61 00 00 01 00 00LET a=1
  • 69 00 FF FE FF 00LET i=-65534
  • 6A 00 00 FE FF 00LET j=65534
  • 62 93 3C 61 4E 00LET b=12345678 (12345678 = 0xBC614E)
  • 63 93 BC 61 4E 00LET c=-12345678
  • 65 82 2D F8 4C ADLET e=2.71828
101nnnnn numeric variable with multi-character name
  • 1 byte: 0xA1 a to 0xBA z
  • n bytes: remainder of name (A-Z, a-z, 0-9 ASCII allowed, final character has high bit set)
  • 5 bytes: numeric value
  • B0 69 E5 82 49 4A F0 41LET pie=3.1452 ("p"=0x10 ORed with 0xA0. "i" is regular ASCII (0x69). "e" is regular ASCII (0x65) ORed with 0x80 to indicate the end of the variable name)
010nnnnn string variable
  • 1 byte: 0x41 a$ to 0x5A z$
  • 2 bytes: little-endian string length (n)
  • n bytes: string value (ASCII)
  • 41 04 00 54 45 53 54LET a$="TEST"
100nnnnn numeric array
  • 1 byte: 0x81 a(n) to 0x9A z(n)
  • 2 bytes: little-endian size of data to follow in bytes, so you can easily skip to the next variable without computing full array size
  • 1 byte: number of dimensions in the array (1-255?)
  • 2 bytes: little-endian valid range of first dimension e.g. 0A 00 is valid range 1-10
  • ... 2 more bytes for every other dimension
  • ... array values (each a 5-byte number) in C-style order, iterating rightmost index first up to leftmost index, e.g. DIM a(2,3,4) will have values specified in this order: a(1,1,1), a(1,1,2), a(1,1,3), (1,1,4), a(1,2,1), a(1,2,2), a(1,2,3), a(1,2,4), a(1,3,1), a(1,3,2), a(1,3,3), a(1,3,4), a(2,1,1), ...
  • 81 1C 00 01 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00DIM a(5)
  • 81 19 00 02 02 00 02 00 00 00 01 00 00 00 00 02 00 00 00 00 03 00 00 00 00 04 00 00DIM a(2,2); LET a(1,1)=1; LET a(1,2)=2; LET a(2,1)=3; LET a(2,2)=4
110nnnnn character array
  • 1 byte: 0xC1 a$(n) to 0xDA z$(n)
111nnnnnn 0xE1 a to 0xFA z control variable of a FOR-NEXT loop. 18 bytes follow: three 5-byte numbers (current value, loop end limit, loop step increment), 2 bytes little-endian line number where the FOR loop was started (or 0xFEFF if started from the command line), then one more byte whose purpose is unknown
  • E9 00 00 01 00 00 00 00 05 00 00 00 00 01 00 00 FF FE 02FOR i=1 TO 5 (before loop has run)
  • E9 00 00 06 00 00 00 00 05 00 00 00 00 01 00 00 FF FE 02FOR i=1 TO 5 (after loop has run)
  • E9 00 00 01 00 00 00 00 0A 00 00 00 00 02 00 00 FF FE 02FOR i=1 TO 10 STEP 2
  • E9 81 0C CC CC CD 83 76 66 66 66 7D 4C CC CC CC FE FF 02FOR i=1.1 TO 7.7 STEP .1

Numbers have one of two formats:[2] integers between -65535 and +65535 (inclusive) are in an "integral" format, while all other numbers are in "floating point" format.

With "integral" format:

  • the first byte is always 0
  • the second byte is 0 0 if the number is positive or -1 (0xFF) if the number is negative
  • the third byte is the least significant 8 bits of the number
  • the fourth byte is the most significant 8 bits of the number
  • the fifth byte is always 0

With "floating point" format, the number is a 32-bit mantissa (m) and 8-bit signed exponent (e). The number is normalised so that its most significant bit is always 1.

  • the first byte is the exponent plus 128, e.g. 0 means e=-128, 127 means e=-1, 128 means e=0, 129 means e=1, 255 means e=127
  • the remaining bytes are the mantissa in big-endian order. The MSB of the mantissa (which has been shifted so it's always 1) is replaced with a sign bit (positive numbers have MSB set to 0, negative numbers have MSB set to 1)

ZX81 BASIC Tokens

Token (Decimal) Description
0
11 "
12 £
13 $
14 :
15 ?
16 (
17 )
18 >
19 <
20 =
21 +
22 -
23 *
24 /
25 ;
26 ,
27 .
28 0
29 1
30 2
31 3
32 4
33 5
34 6
35 7
36 8
37 9
38 A
39 B
40 C
41 D
42 E
43 F
44 G
45 H
46 I
47 J
48 K
49 L
50 M
51 N
52 O
53 P
54 Q
55 R
56 S
57 T
58 U
59 V
60 W
61 X
62 Y
63 Z
64 RND
65 INKEY$
66 PI
112 <cursor up>
113 <cursor down>
114 <cursor left>
115 <cursor right>
116 GRAPHICS
117 EDIT
118 NEWLINE
119 RUBOUT
120 /
121 FUNCTION
127 cursor
128
139 "
140 £
141 $
142 :
143 ?
144 (
145 )
146 >
147 <
148 =
149 +
150 -
151 *
152 /
153 ;
154 -
155 .
156 0
157 1
158 2
159 3
160 4
161 5
162 6
163 7
164 8
165 9
166 A
167 B
168 C
169 D
170 E
171 F
172 G
173 H
174 I
175 J
176 K
177 L
178 M
179 N
180 O
181 P
182 Q
183 R
184 S
185 T
186 U
187 V
188 W
189 X
190 Y
191 Z
192 ""
193 AT
194 TAB
195 ?
196 CODE
197 VAL
198 LEN
199 SIN
200 COS
201 TAN
202 ASN
203 ACS
204 ATN
205 LN
206 EXP
207 INT
208 SQR
209 SGN
210 ABS
211 PEEK
212 USR
213 STR$
214 CHR$
215 NOT
216 **
217 OR
218 AND
219 <=
220 >=
221 <>
222 THEN
223 TO
224 STEP
225 LPRINT
226 LLIST
227 STOP
228 SLOW
229 FAST
230 NEW
231 SCROLL
232 CONT
233 DIM
234 REM
235 FOR
236 GOTO
237 GOSUB
238 INPUT
239 LOAD
240 LIST
241 LET
242 PAUSE
243 NEXT
244 POKE
245 PRINT
246 PLOT
247 RUN
248 SAVE
249 RAND
250 IF
251 CLS
252 UNPLOT
253 CLEAR
254 RETURN
255 COPY

Links and references

  1. http://www.worldofspectrum.org/ZXBasicManual/zxmanchap7.html
  2. http://www.worldofspectrum.org/ZXBasicManual/zxmanchap24.html
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox