Lingo bytecode

This is a partial, work-in-progress examination of the bytecode created when Lingo code is compiled in Macromedia Director 4.0. It describes instructions for a stack-based virtual machine. This virtual machine is sometimes known as the IML, or Idealized Machine Layer.

Each instruction is one, two or three bytes.
 * If the first byte is in the range 0x00-0x3F, then the full instruction is one byte.
 * If the first byte is in the range 0x40-0x7F, then the full instruction is two bytes.
 * If the first byte is in the range 0x80-0xFF, then the full instruction is three bytes.

Constant blobs like string literals are stored after the bytecode, and referred to by records that are six bytes (director 7 uses eight bytes) long regardless of the actual length of the data. This means the first constant will be referred to as 0x00, the second constant as 0x06, the third as 0x0C, and so on. Integer literals over 32767 and floating-point number literals are also stored as constants.

There is also a namelist for referring to external identifiers, stored separately from the bytecode. This is a simple array of strings.

= One-Byte Instructions =

= Two-Byte Instructions =

= Three Byte Instructions =

= Syntactic Sugar =

Some functions get special syntax when written out in source code, but under the hood, the compiler just transforms it into more regular syntax. Here is a mapping that shows the equivalent in plain, generalized Lingo that gets used for the bytecode.

=  bytecode-container chunk layout =

Function Record
Each function record is 42 bytes long.

Bytecode Trailer
After the bytecode section for a function (determined using the offset and length fields from the function record), and then after an additional padding byte if there are an odd number of bytes in the bytecode, are the following values:
 * For each argument: uint16 namelist index for the argument's name
 * For each local variable: uint16 namelist index for the variable's name
 * Count (C) * uint16
 * Count (D) * uint8
 * A padding byte if Count (D) is an odd number

Constants
Each constant record is six bytes long and has this format:
 * uint16: Value type ID
 * uint32: Data address, relative to the base address given in the header

In Director 5 files, eight bytes long records has been found, and has the following format:
 * uint32: Value type ID
 * uint32: Data address, relative to the base address given in the header

Here is how the value type IDs correspond to the data found at the given address:

= Projector File (Windows) =

Director 3.0
At the very end of the projector executable is a 32-bit little-endian file address.

At this location is found:
 * 7 bytes: Not sure/more research needed
 * uint32: Length of the RIFF block
 * uint8: Length of the original RIFF file's name
 * ASCII: Original RIFF file's name
 * uint8: Length of the original RIFF file's parent folder
 * ASCII: Original RIFF file's parent folder
 * RIFF block

Director 4.0
At the very end of the projector executable is a 32-bit little-endian file address.

At this location is found:
 * ASCII "PJ93"
 * The file address of the main RIFF data file
 * Six further addresses for other embedded data (more research required to know more about these)

Lingo script decompile
A proof of concept of a Lingo script decompiler (still a Work In Progress) can be found in "drxtract" python script (DRI and DRX files data extractor) in GitHub (https://github.com/System25/drxtract). This scripts tries to extract images as BMP files, sound as WAV files and Lingo scripts as Lscr files.

The relationship between the file names and the cast members must be calculated with the help of the KEY file and CAS file. For example if in the KEY file we see the following line (file name, CAS index):

18.STXT, 0000000e

We will look for the CAS index (0000000e in the example) inside the CAS file. If that number is in the line 5 of the CAS file, that means that the text content of 18.STXT file is the element 5 of the casting.