PostScript binary object format

PostScript binary object format is a binary format for serialization of data structures, and it is produced by the printobject and writeobject operators in PostScript, and can also be read by PostScript.

The below description of this format omits some details which are not applicable outside of PostScript (and are not written out by PostScript either, although PostScript is capable of reading them).

The first byte is always in range 128 to 131, which indicates endianness and floating point format, where 128 means big-endian integers with native floating numbers, 129 means small-endian integers with native floating numbers, 130 means big-endian integers with big-endian IEEE floating numbers, and 131 means small-endian integers with small-endian IEEE floating numbers. However, due to a bug in Adobe's software (this bug is also emulated in Ghostscript for compatibility), 128 and 129 actually act like 130 and 131 if the native floating point format is IEEE.

After that is the rest of the header, which can be either a short or long header:
 * Short header: 8-bit number of objects in top-level array (which must be nonzero), followed by 16-bit overall length in bytes (including the header).
 * Long header: Zero (one byte), followed by 16-bit number of objects in top-level array, followed by 32-bit overall length in bytes (including the header).

This is then followed by the number of objects specified in the header. Each object consists of eight bytes:
 * Type (1 byte): See list below.
 * Tag (1 byte): Normally zero. However, nonzero numbers can be used for PostScript programs to transmit commands to a program which isn't written in PostScript.
 * Length (2 bytes): A 16-bit integer; the meaning depends on the type.
 * Value (4 bytes): Depending on the type, this may be unused (in which case it should be zero), or a 32-bit integer, or a 32-bit floating number.

Types:

Adding 128 to any of the above type numbers results in the same type, but it is now executable (this distinction is not relevant outside of PostScript).

All offsets are relative to the start of the data after the header; the header is not counted. For arrays and dictionaries, this offset points to another object (which may be followed by further objects); for names and strings, this points to the first character of the name or string. Objects must be aligned, and all of them must come before the name/string data. Names/strings need not be aligned.

Lengths of names/strings is in bytes; lengths of arrays/dictionaries is the number of objects. The length of names must be positive; strings may have zero length.

Null and mark are both types each having a single value.

For floating numbers, the scale factor is normally zero; this means the value is encoded as a floating number in the IEEE or native format (according to the header). If written out by PostScript, this is always the case. Nonzero scale factors means it is encoded as an integer, and the scale factor is the number of fraction bits. The use of nonzero scale factors is not recommended for interchange, since PostScript doesn't write them out, and the implementation in other programming languages can be simplified if they are not used.

The ability to store dictionaries in PostScript binary object format is a nonstandard extension which was implemented in Ghostscript (although it is disabled by default, and has recently been removed). It is encoded like an array, with alternating keys and values. The keys must all be distinct, and keys may not be nulls or strings.

Implementation in C
The following implementation in C is public domain. This implementation is only reading the data, and does not do writing. The variable called object is expected to point to the type byte of the first object, and the data is expected to be in big-endian format.


 * 1) define TY_NULL 0
 * 2) define TY_INT 1
 * 3) define TY_REAL 2
 * 4) define TY_NAME 3
 * 5) define TY_BOOL 4
 * 6) define TY_STRING 5
 * 7) define TY_ARRAY 9
 * 8) define TY_MARK 10


 * 1) define obj_float(x) ({ int x__=(x); int t_=obj_type(x__); t_==TY_INT?(float)obj_rawvalue(x__):t_==TY_REAL?obj_ufloat(x__):0; })
 * 2) define obj_index(x,y) (obj_rawvalue(x)+(y)*8)
 * 3) define obj_int(x) ({ int x__=(x); int t_=obj_type(x__); t_==TY_INT||t_==TY_BOOL?obj_rawvalue(x__):t_==TY_REAL?(int)obj_ufloat(x__):0; })
 * 4) define obj_isnum(x) ({ int x__=(x); obj_type(x__)==TY_INT || obj_type(x__)==TY_REAL; })
 * 5) define obj_length(x) ({ int x__=(x); (object[x__+2]<<8)|object[x__+3]; })
 * 6) define obj_ptr(x) (object+obj_rawvalue(x))
 * 7) define obj_rawvalue(x) ({ int y__=(x); (int)((object[y__+4]<<24)|(object[y__+5]<<16)|(object[y__+6]<<8)|object[y__+7]); })
 * 8) define obj_tag (object[1])
 * 9) define obj_type(x) (object[x]&127)
 * 10) define obj_ufloat(x) ({ union { int i; float f; } f__; f__.i=obj_rawvalue(x); f__.f; }) // see psi/ibnum.h in Ghostscript for an explanation of this bug