Softdisk Publishing UDF files

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Add infobox/ontology)
Line 69: Line 69:
  
 
[[Category:Text-based data]]
 
[[Category:Text-based data]]
 +
[[Category:Softdisk]]

Revision as of 16:08, 20 April 2013

File Format
Name Softdisk Publishing UDF files
Ontology
Extension(s) .udf, various other extensions

Softdisk was a publisher of diskmagazines and other software from the 1980s through the early 2000s, as well as a dialup Internet Service Provider (in the Shreveport, LA area) and web host in the late 1990s and early 2000s. It's perhaps best known as the place John Romero, John Carmack, and other founders of Id Software were working when they started their own game-making company as moonlighters.

Softdisk programs used a wide variety of file formats (text-based and binary) for loading and saving data, but at one point in the 1990s some of its developers decided to attempt to standardize the internally-created file formats for future programs, leading to a format specification they termed "UDF" (Universal Data Format, or Uniform Data Format, or Uniform Data File? Computer geeks can be pretty arrogant about regarding their own quirky data formats as being "universal" or "uniform", leading to lots of uses of the letter "U" in such acronyms). There was an internal spec document (which I've unfortunately not been able to dig up yet, though I'm sure I had a copy at some point).

This is more of a "meta-format" designed to allow the definition of program-specific data file formats for different programs, with some common structural conventions. Each program's data file format has program-specific elements. Files of this sort can be found on various issues of Softdisk publications such as Softdisk PC, Softdisk for Windows, and Softdisk for Mac. Some of them have a .UDF file extension, but various program-specific extensions were also used.

Programs using files of this format include:

  • Criss Cross
  • Crypto Sleuth
  • Paragon
  • Sokoban
  • Super Crossword
  • Trivia Now
  • Word Finder

Structure

A UDF file is made up of a series of "chunks" consisting of one or more lines of data delimited by lines preceding and following the data containing particular strings of text. Lines can be separated with CR, LF, or CR+LF to allow such files to be created and used on a variety of platforms; programs processing such files are expected to be able to deal with any of these conventions.

A chunk begins with a line starting with the "$" sign, and then a chunk-type name (some character string; there are a few standard chunk names as well as program-specific ones), then in some cases a space and an identifier (name or number) for a specific data item (since some types of data can exist in multiple instances). The chunk terminates with a line consisting of "$EOC" for End Of Chunk, so this string can't occur within the data of a chunk (or at least not at the beginning of a line).

Anything following a semicolon (and any whitespace preceding the semicolon) is considered a comment.

Sample chunks:

$DOC
SILLYPROG               ; driver tag
1                       ; major version
1                       ; minor version
$EOC
$ABOUT
Silly Program Levels    ; data set title
John Q. Doe             ; data set author
1.0                     ; data set version
Copywrong 1492 No Publisher
$EOC

The "DOC" chunk is a standard chunk that identifies which program the file is intended for, with a "driver tag" uniquely identifying the program, and major and minor versions of the file format. The "ABOUT" chunk has more information (intended to be displayed in an "About" box within the program), consisting (in order) of the title, author, version, and copyright notice of the particular data set (not the program it loads into).

Some of the driver tags that have been used:

  • PRGN: Paragon
  • SOKOBAN: Sokoban
  • TRIVNOW: Trivia Now

There was also a "FILESPEC" chunk which had some sort of cryptic identifier that apparently had something to do with indentifying what sort of file it was (but wasn't that what the "DOC" chunk did? Gee, I really have to dig up that spec sheet...).

Other chunks have program-specific data.

One commonly-recurrent chunk was "SECRETWORD", which contained a secret word (sometimes in plaintext, sometimes encrypted) which was then output by the program when the user solved a puzzle or completed a game successfully, so that the user could send it in to qualify for some sort of contests, prizes, or something of that sort. Obviously, the plaintext secret words made cheating pretty easy, which is why encryption was eventually introduced. Actually, now that I think about it, it's possible the "SECRETWORD" chunk (whether plaintext or encyrypted) was just a "decoy" secret word to confuse potential cheaters, and the real secret word was elsewhere, perhaps in that cryptic "FILESPEC" chunk. Pretty devious, if true... need to research this a bit more.

Game files sometimes had a "SCORES" chunk that was filled out with zeroes and/or dots taking up enough characters that the high scores could be written back into the file, overwriting the dummy characters but leaving the rest of the data intact if the program is well-written enough to position the writes properly.

Chunks with multiple instances can be numbered:

$PUZZLE 1
Some data for the first puzzle...
$EOC
$PUZZLE 2
Some data for the second puzzle...
$EOC
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox