Commodore BASIC tokenized file
| Dan Tobias  (Talk | contribs) | Dexvertbot  (Talk | contribs)  m (→Sample files) | ||
| (24 intermediate revisions by 4 users not shown) | |||
| Line 6: | Line 6: | ||
| }} | }} | ||
| − | '''Commodore BASIC tokenized files''' stored programs in the  | + | '''Commodore BASIC tokenized files''' stored programs in the versions of the [[BASIC]] programming language used on Commodore computers, including the PET, VIC-20, Commodore 64, and Commodore 128. A number of versions were used, deriving from a version that was licensed perpetually from Microsoft by Commodore for a one-time fee, and further developed internally at Commodore. The most common version is 2.0, which was found on the Commodore 64, though earlier PET computers had BASIC 4.0 (Commodore put an out-of-date BASIC in the 64 because it was "just a home computer" not expected to be used for serious stuff). The Commodore 128 had BASIC 7.0, and the Commodore 16 (PAL) had BASIC 3.5. | 
| − | Like most BASICs of its era, Commodore BASIC used a tokenized format to save its programs, rather than plain-text source code. Printable [[PETSCII]] characters (and the various control codes which could be used within literal strings to do things like change the color of text) generally stood for themselves, but other bytes had different meanings. The "high-bit" bytes from #128-#254 stood for the various BASIC commands and mathematical operators (#255 was used for the "pi" character). A null (#0) byte marked the end of a program line, and some header bytes were used at the start of the line to encode the line number and the byte offset to the next line (a 2-byte little-endian unsigned integer, with 0 indicating the last line of the program). | + | Like most BASICs of its era, Commodore BASIC used a tokenized format to save its programs, rather than plain-text source code. Printable [[PETSCII]] characters (and the various control codes which could be used within literal strings to do things like change the color of text) generally stood for themselves, but other bytes had different meanings. The "high-bit" bytes from #128-#254 stood for the various BASIC commands and mathematical operators (#255 was used for the "pi" character). A null (#0) byte marked the end of a program line, and some header bytes were used at the start of the line to encode the line number and the byte offset to the next line (a 2-byte [[Endianness|little-endian]] unsigned integer, with 0 indicating the last line of the program). | 
| Only the characters up to #203 were actually assigned BASIC commands, leaving #204-#254 unassigned and available for future expansion; there may be third-party extended BASICs that use some of them. | Only the characters up to #203 were actually assigned BASIC commands, leaving #204-#254 unassigned and available for future expansion; there may be third-party extended BASICs that use some of them. | ||
| − | Unlike some other BASICs of the time, the Commodore tokenizer didn't collapse extra whitespace; all space characters entered by the programmer were stored in the file. This meant that you could  | + | Unlike some other BASICs of the time, the Commodore tokenizer didn't collapse extra whitespace on tokenization, or expand it on listing a program; all space characters entered by the programmer were stored in the file. This meant that you could save disk and memory space by eliminating all unnecessary spaces from the code, though this might make the code harder to read at places. Very few spaces were actually necessary; '''FORI=1TO10''' worked the same as '''FOR I = 1 TO 10'''. | 
| − | BASIC programs were stored by Commodore DOS as file type "PRG" (program), in which the first two bytes stored the memory location it was expected to be loaded into. This was only used when the file was loaded with the '''LOAD filename,8,1''' command, where the final '1' told it to use the memory location in the file; '''LOAD filename,8''' always loaded it into the normal BASIC program memory space. When these files are transferred to other platforms, they are often saved with .prg extensions, though this extension was not part of the original filename on the Commodore (the file-type is a separate field in Commodore directory structures). | + | BASIC programs were stored by Commodore DOS as file type "PRG" (program), in which the first two bytes stored the memory location it was expected to be loaded into. This was only used when the file was loaded with the '''LOAD filename,8,1''' command, where the final '1' told it to use the memory location in the file; '''LOAD filename,8''' always loaded it into the normal BASIC program memory space. When these files are transferred to other platforms, they are often saved with .prg extensions, though this extension was not part of the original filename on the Commodore (the file-type is a separate field in [[CBMFS|Commodore directory structures]]). | 
| + | |||
| + | |||
| + | == Tokens == | ||
| + | {| class="wikitable" | ||
| + | ! title="Hexadecimal code point" | Hex | ||
| + | ! title="Decimal code point" | Dec | ||
| + | ! title="BASIC element the token stands for" | Token meaning | ||
| + | |- | ||
| + | |80||128||END | ||
| + | |- | ||
| + | |81||129||FOR | ||
| + | |- | ||
| + | |82||130||NEXT | ||
| + | |- | ||
| + | |83||131||DATA | ||
| + | |- | ||
| + | |84||132||INPUT# | ||
| + | |- | ||
| + | |85||133||INPUT | ||
| + | |- | ||
| + | |86||134||DIM | ||
| + | |- | ||
| + | |87||135||READ | ||
| + | |- | ||
| + | |88||136||LET | ||
| + | |- | ||
| + | |89||137||GOTO | ||
| + | |- | ||
| + | |8A||138||RUN | ||
| + | |- | ||
| + | |8B||139||IF | ||
| + | |- | ||
| + | |8C||140||RESTORE | ||
| + | |- | ||
| + | |8D||141||GOSUB | ||
| + | |- | ||
| + | |8E||142||RETURN | ||
| + | |- | ||
| + | |8F||143||REM | ||
| + | |- | ||
| + | |90||144||STOP | ||
| + | |- | ||
| + | |91||145||ON | ||
| + | |- | ||
| + | |92||146||WAIT | ||
| + | |- | ||
| + | |93||147||LOAD | ||
| + | |- | ||
| + | |94||148||SAVE | ||
| + | |- | ||
| + | |95||149||VERIFY | ||
| + | |- | ||
| + | |96||150||DEF | ||
| + | |- | ||
| + | |97||151||POKE | ||
| + | |- | ||
| + | |98||152||PRINT# | ||
| + | |- | ||
| + | |99||153||PRINT | ||
| + | |- | ||
| + | |9A||154||CONT | ||
| + | |- | ||
| + | |9B||155||LIST | ||
| + | |- | ||
| + | |9C||156||CLR | ||
| + | |- | ||
| + | |9D||157||CMD | ||
| + | |- | ||
| + | |9E||158||SYS | ||
| + | |- | ||
| + | |9F||159||OPEN | ||
| + | |- | ||
| + | |A0||160||CLOSE | ||
| + | |- | ||
| + | |A1||161||GET | ||
| + | |- | ||
| + | |A2||162||NEW | ||
| + | |- | ||
| + | |A3||163||TAB( | ||
| + | |- | ||
| + | |A4||164||TO | ||
| + | |- | ||
| + | |A5||165||FN | ||
| + | |- | ||
| + | |A6||166||SPC( | ||
| + | |- | ||
| + | |A7||167||THEN | ||
| + | |- | ||
| + | |A8||168||NOT | ||
| + | |- | ||
| + | |A9||169||STEP | ||
| + | |- | ||
| + | |AA||170||+ | ||
| + | |- | ||
| + | |AB||171||- | ||
| + | |- | ||
| + | |AC||172||* | ||
| + | |- | ||
| + | |AD||173||/ | ||
| + | |- | ||
| + | |AE||174||^ | ||
| + | |- | ||
| + | |AF||175||AND | ||
| + | |- | ||
| + | |B0||176||OR | ||
| + | |- | ||
| + | |B1||177||> | ||
| + | |- | ||
| + | |B2||178||= | ||
| + | |- | ||
| + | |B3||179||< | ||
| + | |- | ||
| + | |B4||180||SGN | ||
| + | |- | ||
| + | |B5||181||INT | ||
| + | |- | ||
| + | |B6||182||ABS | ||
| + | |- | ||
| + | |B7||183||USR | ||
| + | |- | ||
| + | |B8||184||FRE | ||
| + | |- | ||
| + | |B9||185||POS | ||
| + | |- | ||
| + | |BA||186||SQR | ||
| + | |- | ||
| + | |BB||187||RND | ||
| + | |- | ||
| + | |BC||188||LOG | ||
| + | |- | ||
| + | |BD||189||EXP | ||
| + | |- | ||
| + | |BE||190||COS | ||
| + | |- | ||
| + | |BF||191||SIN | ||
| + | |- | ||
| + | |C0||192||TAN | ||
| + | |- | ||
| + | |C1||193||ATN | ||
| + | |- | ||
| + | |C2||194||PEEK | ||
| + | |- | ||
| + | |C3||195||LEN | ||
| + | |- | ||
| + | |C4||196||STR$ | ||
| + | |- | ||
| + | |C5||197||VAL | ||
| + | |- | ||
| + | |C6||198||ASC | ||
| + | |- | ||
| + | |C7||199||CHR$ | ||
| + | |- | ||
| + | |C8||200||LEFT$ | ||
| + | |- | ||
| + | |C9||201||RIGHT$ | ||
| + | |- | ||
| + | |CA||202||MID$ | ||
| + | |- | ||
| + | |CB||203||GO | ||
| + | |- | ||
| + | |} | ||
| == Format documentation == | == Format documentation == | ||
| * [http://www.c64-wiki.com/index.php/BASIC_token Commodore BASIC tokens] | * [http://www.c64-wiki.com/index.php/BASIC_token Commodore BASIC tokens] | ||
| + | * [ftp://www.zimmers.net/pub/cbm/programming/cbm-basic-tokens.txt Commodore token list] | ||
| == Software == | == Software == | ||
| * [http://www.luigidifraia.com/c64/index.htm#BL CBM BASIC Lister] (for Linux and Windows) | * [http://www.luigidifraia.com/c64/index.htm#BL CBM BASIC Lister] (for Linux and Windows) | ||
| + | * [http://style64.org/dirmaster DirMaster: reads C64 disk images / archives / files in Windows] | ||
| + | * JSMESS in-browser emulations: [http://jsmess.textfiles.com/messloader.html?module=c64 C-64], [http://jsmess.textfiles.com/messloader.html?module=c128 C-128], [http://jsmess.textfiles.com/messloader.html?module=pet2001 PET-2001], [http://jsmess.textfiles.com/messloader.html?module=pet2001n PET-2001n], [http://jsmess.textfiles.com/messloader.html?module=c16 C-16 (PAL)] | ||
| + | * [http://code.google.com/p/detox64/ detox64] detokenizer | ||
| + | |||
| + | == Sample files == | ||
| + | * {{DexvertSamples|document/cbmBasic}} | ||
| == Other links and references == | == Other links and references == | ||
| * [[Wikipedia:Commodore BASIC|Wikipedia article: Commodore BASIC]] | * [[Wikipedia:Commodore BASIC|Wikipedia article: Commodore BASIC]] | ||
| + | * [http://www.pagetable.com/?p=48 Commodore BASIC as a scripting language for UNIX and Windows] | ||
| + | * [https://www.youtube.com/watch?feature=player_embedded&v=0FLlxq5LSlo#at=127 Today's kids try the Commodore 64] | ||
| + | * [http://www.amazon.com/gp/product/0262018462/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=0262018462&linkCode=as2&tag=bogost-20 10 PRINT CHR$(205.5+RND(1)); : GOTO 10 (Software Studies)] | ||
| + | * [http://www.pagetable.com/c64rom/ Documented disassembled code from Commodore BASIC ROM] | ||
| + | * [http://www.pagetable.com/?p=774 1978 source to Microsoft 6502 BASIC] (ancestral to Commodore BASIC; token list embedded is similar but not quite the same) | ||
| + | * [http://www.reddit.com/r/c64/comments/335r5k/heres_a_thing_thats_been_bugging_me_for_over_25/cqhw0p4 Why did the Commodore 64 use two spaces in its syntax error message?] | ||
| [[Category:Commodore computers]] | [[Category:Commodore computers]] | ||
| + | [[Category:Microsoft]] | ||
Latest revision as of 04:24, 28 December 2023
Commodore BASIC tokenized files stored programs in the versions of the BASIC programming language used on Commodore computers, including the PET, VIC-20, Commodore 64, and Commodore 128. A number of versions were used, deriving from a version that was licensed perpetually from Microsoft by Commodore for a one-time fee, and further developed internally at Commodore. The most common version is 2.0, which was found on the Commodore 64, though earlier PET computers had BASIC 4.0 (Commodore put an out-of-date BASIC in the 64 because it was "just a home computer" not expected to be used for serious stuff). The Commodore 128 had BASIC 7.0, and the Commodore 16 (PAL) had BASIC 3.5.
Like most BASICs of its era, Commodore BASIC used a tokenized format to save its programs, rather than plain-text source code. Printable PETSCII characters (and the various control codes which could be used within literal strings to do things like change the color of text) generally stood for themselves, but other bytes had different meanings. The "high-bit" bytes from #128-#254 stood for the various BASIC commands and mathematical operators (#255 was used for the "pi" character). A null (#0) byte marked the end of a program line, and some header bytes were used at the start of the line to encode the line number and the byte offset to the next line (a 2-byte little-endian unsigned integer, with 0 indicating the last line of the program).
Only the characters up to #203 were actually assigned BASIC commands, leaving #204-#254 unassigned and available for future expansion; there may be third-party extended BASICs that use some of them.
Unlike some other BASICs of the time, the Commodore tokenizer didn't collapse extra whitespace on tokenization, or expand it on listing a program; all space characters entered by the programmer were stored in the file. This meant that you could save disk and memory space by eliminating all unnecessary spaces from the code, though this might make the code harder to read at places. Very few spaces were actually necessary; FORI=1TO10 worked the same as FOR I = 1 TO 10.
BASIC programs were stored by Commodore DOS as file type "PRG" (program), in which the first two bytes stored the memory location it was expected to be loaded into. This was only used when the file was loaded with the LOAD filename,8,1 command, where the final '1' told it to use the memory location in the file; LOAD filename,8 always loaded it into the normal BASIC program memory space. When these files are transferred to other platforms, they are often saved with .prg extensions, though this extension was not part of the original filename on the Commodore (the file-type is a separate field in Commodore directory structures).
| Contents | 
[edit] Tokens
| Hex | Dec | Token meaning | 
|---|---|---|
| 80 | 128 | END | 
| 81 | 129 | FOR | 
| 82 | 130 | NEXT | 
| 83 | 131 | DATA | 
| 84 | 132 | INPUT# | 
| 85 | 133 | INPUT | 
| 86 | 134 | DIM | 
| 87 | 135 | READ | 
| 88 | 136 | LET | 
| 89 | 137 | GOTO | 
| 8A | 138 | RUN | 
| 8B | 139 | IF | 
| 8C | 140 | RESTORE | 
| 8D | 141 | GOSUB | 
| 8E | 142 | RETURN | 
| 8F | 143 | REM | 
| 90 | 144 | STOP | 
| 91 | 145 | ON | 
| 92 | 146 | WAIT | 
| 93 | 147 | LOAD | 
| 94 | 148 | SAVE | 
| 95 | 149 | VERIFY | 
| 96 | 150 | DEF | 
| 97 | 151 | POKE | 
| 98 | 152 | PRINT# | 
| 99 | 153 | |
| 9A | 154 | CONT | 
| 9B | 155 | LIST | 
| 9C | 156 | CLR | 
| 9D | 157 | CMD | 
| 9E | 158 | SYS | 
| 9F | 159 | OPEN | 
| A0 | 160 | CLOSE | 
| A1 | 161 | GET | 
| A2 | 162 | NEW | 
| A3 | 163 | TAB( | 
| A4 | 164 | TO | 
| A5 | 165 | FN | 
| A6 | 166 | SPC( | 
| A7 | 167 | THEN | 
| A8 | 168 | NOT | 
| A9 | 169 | STEP | 
| AA | 170 | + | 
| AB | 171 | - | 
| AC | 172 | * | 
| AD | 173 | / | 
| AE | 174 | ^ | 
| AF | 175 | AND | 
| B0 | 176 | OR | 
| B1 | 177 | > | 
| B2 | 178 | = | 
| B3 | 179 | < | 
| B4 | 180 | SGN | 
| B5 | 181 | INT | 
| B6 | 182 | ABS | 
| B7 | 183 | USR | 
| B8 | 184 | FRE | 
| B9 | 185 | POS | 
| BA | 186 | SQR | 
| BB | 187 | RND | 
| BC | 188 | LOG | 
| BD | 189 | EXP | 
| BE | 190 | COS | 
| BF | 191 | SIN | 
| C0 | 192 | TAN | 
| C1 | 193 | ATN | 
| C2 | 194 | PEEK | 
| C3 | 195 | LEN | 
| C4 | 196 | STR$ | 
| C5 | 197 | VAL | 
| C6 | 198 | ASC | 
| C7 | 199 | CHR$ | 
| C8 | 200 | LEFT$ | 
| C9 | 201 | RIGHT$ | 
| CA | 202 | MID$ | 
| CB | 203 | GO | 
[edit] Format documentation
[edit] Software
- CBM BASIC Lister (for Linux and Windows)
- DirMaster: reads C64 disk images / archives / files in Windows
- JSMESS in-browser emulations: C-64, C-128, PET-2001, PET-2001n, C-16 (PAL)
- detox64 detokenizer
[edit] Sample files
[edit] Other links and references
- Wikipedia article: Commodore BASIC
- Commodore BASIC as a scripting language for UNIX and Windows
- Today's kids try the Commodore 64
- 10 PRINT CHR$(205.5+RND(1)); : GOTO 10 (Software Studies)
- Documented disassembled code from Commodore BASIC ROM
- 1978 source to Microsoft 6502 BASIC (ancestral to Commodore BASIC; token list embedded is similar but not quite the same)
- Why did the Commodore 64 use two spaces in its syntax error message?

