Atari BASIC tokenized file

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
m (Change telparia.com samples link to template)
 
(9 intermediate revisions by 3 users not shown)
Line 7: Line 7:
 
'''Atari BASIC''' was used on the Atari 400 and 800 computers, among the many systems competing for the home computer market in the late '70s and early '80s. While Atari considered using an adapted Microsoft BASIC like some other manufacturers, they ultimately used an independently-developed BASIC instead, meaning that many characteristics of this BASIC (including its manner of tokenization) differ greatly from the other BASICs of the time.
 
'''Atari BASIC''' was used on the Atari 400 and 800 computers, among the many systems competing for the home computer market in the late '70s and early '80s. While Atari considered using an adapted Microsoft BASIC like some other manufacturers, they ultimately used an independently-developed BASIC instead, meaning that many characteristics of this BASIC (including its manner of tokenization) differ greatly from the other BASICs of the time.
  
While most BASICs used some byte values to represent literal characters (in string constants and variable names, for instance) and others (often in the "high-bit" range of #128-#255) for tokenized keywords, Atari BASIC used a more complicated scheme whereby values all throughout the 256 possibilities were used for tokens of multiple classes. Context determined the interpretation of a value. If it was the first token of a line, its meaning was taken from a list of Statement Name Tokens. At other positions, a different token list was used which included values representing functions, operators, or variables. The variables in the program were itemized in a variable table stored at the beginning of the program, so that references to a variable in the program used only the single-byte token, representing the name. There were 128 positions in the token list for variables (comprising the high-bit values), meaning that only 128 different variables could be used in a program.
+
To understand the differences in the file format, a quick review of other BASICs will be helpful. Some BASICs, notably TinyBASIC, performed very little conversion of the program code typed in by the user. All it did was read the line number, if present, and converted that to an 8-bit integer value. This made it easier to search for in GOTOs and such. The rest of the line was stored exactly as it had been typed, in ASCII format with a trailing CR. Microsoft-like BASICs went slightly further, first converting the line number to a 16-bit value, and then converting the first word on the line to an 8-bit value, the token. This makes it easier to look up the runtime code associated with the instruction, it doesn't have to convert from text form. The rest of the line, as in Tiny, was left as the original text.
  
String constants were marked by the byte 0F (hex), followed by a byte giving the string length (0-255), then the characters of the string itself. Numeric constants were marked by 0E (hex), followed by six bytes holding a floating point value.
+
In contrast, Atari BASIC tokenized every item in the line to an internal format, thereby eliminating any runtime parsing. For instance, any numbers in the code were converted to their five-byte floating-point format and put into memory in that form, with a lead token to indicate it's a numeric constant, eliminating the need to do any conversion at runtime. So while one can read the tokenized form of Tiny or MS with some ability to understand what is going on, Atari BASIC files are completely binary. The only thing that could be read as-is were string literals. String constants were marked by the byte 0F (hex), followed by a byte giving the string length (0-255), then the characters of the string itself. Numeric constants were marked by 0E (hex), followed by six bytes holding a floating-point value.
 +
 
 +
Additionally, Atari BASIC split its tokens into two groups; ''statements'' were the first item on any line (or sub-statement in the case of colons) and had their own library of 256 tokens in the Statement Name Tokens, while other items on the line were taken from another 256-entry table for functions, operators, or variables. The variables in the program were itemized in a variable table stored at the beginning of the program, so that references to a variable in the program used only the single-byte token, representing the name. There were 128 positions in the token list for variables (comprising the high-bit values), meaning that only 128 different variables could be used in a program.
 +
 
 +
Literal characters are in [[ATASCII]], Atari's not-quite-ASCII character set.
  
 
== Software ==
 
== Software ==
 
* [http://joyfulcoder.com/memopad/ Atari Memopad: detokenizes BASIC programs among other functions]
 
* [http://joyfulcoder.com/memopad/ Atari Memopad: detokenizes BASIC programs among other functions]
 +
* [https://github.com/Sembiance/gfalist gfalist: De-tokenizes GFA-BASIC programs from Atari]
 +
 +
== Sample files ==
 +
* {{DexvertSamples|other/gfaBASICAtari}}
  
 
== References ==
 
== References ==
 +
* [http://www.atariarchives.org/dere/chapt10.php Detailed description of tokenization including token list]
 
* [http://users.telenet.be/kim1-6502/6502/absb.html Atari BASIC Source Book] - has assembly source and some descriptions of what it does, from which a complete token table and file format spec could be puzzled out, though the required info is spread out somewhat.
 
* [http://users.telenet.be/kim1-6502/6502/absb.html Atari BASIC Source Book] - has assembly source and some descriptions of what it does, from which a complete token table and file format spec could be puzzled out, though the required info is spread out somewhat.
 +
* [http://archive.org/details/ataribooks Books on Atari programming] in Internet Archive
 +
* [http://archive.org/details/bitsavers_atari40080mputerTechnicalReferenceNotes1982_20170986 Atari technical manual] - Lots of tech details about Atari 400/800
 +
* [https://archive.org/details/Kids_Working_With_Computers_The_Atari_Basic_Manual Kids Working With Computers The Atari Basic Manual]
 +
 +
[[Category:Atari computers]]

Latest revision as of 04:08, 28 December 2023

File Format
Name Atari BASIC tokenized file
Ontology
Released 1979

Atari BASIC was used on the Atari 400 and 800 computers, among the many systems competing for the home computer market in the late '70s and early '80s. While Atari considered using an adapted Microsoft BASIC like some other manufacturers, they ultimately used an independently-developed BASIC instead, meaning that many characteristics of this BASIC (including its manner of tokenization) differ greatly from the other BASICs of the time.

To understand the differences in the file format, a quick review of other BASICs will be helpful. Some BASICs, notably TinyBASIC, performed very little conversion of the program code typed in by the user. All it did was read the line number, if present, and converted that to an 8-bit integer value. This made it easier to search for in GOTOs and such. The rest of the line was stored exactly as it had been typed, in ASCII format with a trailing CR. Microsoft-like BASICs went slightly further, first converting the line number to a 16-bit value, and then converting the first word on the line to an 8-bit value, the token. This makes it easier to look up the runtime code associated with the instruction, it doesn't have to convert from text form. The rest of the line, as in Tiny, was left as the original text.

In contrast, Atari BASIC tokenized every item in the line to an internal format, thereby eliminating any runtime parsing. For instance, any numbers in the code were converted to their five-byte floating-point format and put into memory in that form, with a lead token to indicate it's a numeric constant, eliminating the need to do any conversion at runtime. So while one can read the tokenized form of Tiny or MS with some ability to understand what is going on, Atari BASIC files are completely binary. The only thing that could be read as-is were string literals. String constants were marked by the byte 0F (hex), followed by a byte giving the string length (0-255), then the characters of the string itself. Numeric constants were marked by 0E (hex), followed by six bytes holding a floating-point value.

Additionally, Atari BASIC split its tokens into two groups; statements were the first item on any line (or sub-statement in the case of colons) and had their own library of 256 tokens in the Statement Name Tokens, while other items on the line were taken from another 256-entry table for functions, operators, or variables. The variables in the program were itemized in a variable table stored at the beginning of the program, so that references to a variable in the program used only the single-byte token, representing the name. There were 128 positions in the token list for variables (comprising the high-bit values), meaning that only 128 different variables could be used in a program.

Literal characters are in ATASCII, Atari's not-quite-ASCII character set.

[edit] Software

[edit] Sample files

[edit] References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox