Commodore BASIC tokenized file

From Just Solve the File Format Problem
Jump to: navigation, search
File Format
Name Commodore BASIC tokenized file
Extension(s) .prg
Released 1977

Commodore BASIC tokenized files stored programs in the versions of the BASIC programming language used on Commodore computers, including the PET, VIC-20, Commodore 64, and Commodore 128. A number of versions were used, deriving from a version that was licensed perpetually from Microsoft by Commodore for a one-time fee, and further developed internally at Commodore. The most common version is 2.0, which was found on the Commodore 64, though earlier PET computers had BASIC 4.0 (Commodore put an out-of-date BASIC in the 64 because it was "just a home computer" not expected to be used for serious stuff). The Commodore 128 had BASIC 7.0, and the Commodore 16 (PAL) had BASIC 3.5.

Like most BASICs of its era, Commodore BASIC used a tokenized format to save its programs, rather than plain-text source code. Printable PETSCII characters (and the various control codes which could be used within literal strings to do things like change the color of text) generally stood for themselves, but other bytes had different meanings. The "high-bit" bytes from #128-#254 stood for the various BASIC commands and mathematical operators (#255 was used for the "pi" character). A null (#0) byte marked the end of a program line, and some header bytes were used at the start of the line to encode the line number and the byte offset to the next line (a 2-byte little-endian unsigned integer, with 0 indicating the last line of the program).

Only the characters up to #203 were actually assigned BASIC commands, leaving #204-#254 unassigned and available for future expansion; there may be third-party extended BASICs that use some of them.

Unlike some other BASICs of the time, the Commodore tokenizer didn't collapse extra whitespace on tokenization, or expand it on listing a program; all space characters entered by the programmer were stored in the file. This meant that you could save disk and memory space by eliminating all unnecessary spaces from the code, though this might make the code harder to read at places. Very few spaces were actually necessary; FORI=1TO10 worked the same as FOR I = 1 TO 10.

BASIC programs were stored by Commodore DOS as file type "PRG" (program), in which the first two bytes stored the memory location it was expected to be loaded into. This was only used when the file was loaded with the LOAD filename,8,1 command, where the final '1' told it to use the memory location in the file; LOAD filename,8 always loaded it into the normal BASIC program memory space. When these files are transferred to other platforms, they are often saved with .prg extensions, though this extension was not part of the original filename on the Commodore (the file-type is a separate field in Commodore directory structures).



Hex Dec Token meaning
80 128 END
81 129 FOR
82 130 NEXT
83 131 DATA
84 132 INPUT#
85 133 INPUT
86 134 DIM
87 135 READ
88 136 LET
89 137 GOTO
8A 138 RUN
8B 139 IF
8D 141 GOSUB
8F 143 REM
90 144 STOP
91 145 ON
92 146 WAIT
93 147 LOAD
94 148 SAVE
95 149 VERIFY
96 150 DEF
97 151 POKE
98 152 PRINT#
99 153 PRINT
9A 154 CONT
9B 155 LIST
9C 156 CLR
9D 157 CMD
9E 158 SYS
9F 159 OPEN
A0 160 CLOSE
A1 161 GET
A2 162 NEW
A3 163 TAB(
A4 164 TO
A5 165 FN
A6 166 SPC(
A7 167 THEN
A8 168 NOT
A9 169 STEP
AA 170 +
AB 171 -
AC 172 *
AD 173 /
AE 174 ^
AF 175 AND
B0 176 OR
B1 177 >
B2 178 =
B3 179 <
B4 180 SGN
B5 181 INT
B6 182 ABS
B7 183 USR
B8 184 FRE
B9 185 POS
BA 186 SQR
BB 187 RND
BC 188 LOG
BD 189 EXP
BE 190 COS
BF 191 SIN
C0 192 TAN
C1 193 ATN
C2 194 PEEK
C3 195 LEN
C4 196 STR$
C5 197 VAL
C6 198 ASC
C7 199 CHR$
C8 200 LEFT$
C9 201 RIGHT$
CA 202 MID$
CB 203 GO

Format documentation


Other links and references

Personal tools