Glossary

From Just Solve the File Format Problem
Revision as of 01:58, 25 January 2013 by Dan Tobias (Talk | contribs)

Jump to: navigation, search

Some of the terms that might be encountered in descriptions of file formats:

Big-Endian: The system of storing numeric values which take up more than one byte in a manner in which the high-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "larger-valued" parts of the number come in the first byte or the last one. See Endianness.

Binary: Base 2 numbers, consisting entirely of the digits 1 and 0. These are very important in computing, where everything is stored on a digital computer in the form of a series of binary digits, or bits. While all data formats are "binary" in this sense, usually "binary file format" is used to refer to a method of storing data that is something other than plain text; it consists of raw numbers which don't look like anything meaningful when brought up in a text editor. Even raw binary data is rarely displayed as actual 1s and 0s; developers usually use more compact notations such as hexadecimal or octal.

Bit: A single binary digit. In a computer it is stored in something analogous to a light switch, which can be turned on or off, representing digits 1 and 0 respectively. (Of course, a bit in computer memory is much smaller than a light switch; millions of them fit on a computer chip or optical or magnetic storage medium.)

Byte: A group of eight bits, sometimes also referred to as an "octet". This is how computer memory is traditionally organized. Usually a byte is treated as a unit, representing a number from 0 to 255 or else a text character in an encodings ucha s ASCII, but some file formats delve into the individual bits; the 8 bits which make up a byte are arranged from the "high-order bit" to the "low-order bit" based on where they fall in the binary number represented by them; as with conventional decimal numbers, the leftmost one has the highest value and is the "high-order bit". (How they're physically arranged on the storage medium depends on the characteristics of the specific device. This is usually not of concern to programmers of anything above low-level device drivers and processor microcode; normal developers see only the abstract logical structure of the bits and bytes.) In raw memory dumps, a byte will often be displayed as two hexadecimal digits. (Byte is also a computer magazine published since the 1970s.)

Hexadecimal (or "hex"): Numbers expressed in base 16. This works similarly to the base-10 decimal system usually used by humans (probably because we have ten fingers, if you include the thumbs), but with some extra digits to make up 16 digits in all, which comprise the numbers 0 through 9, plus the letters A through F (representing values of 10 through 15). Programmers use hexadecimal often, since 16 is a power of 2 and hence it is easier to translate between binary and hexadecimal than it is to get to and from decimal; each hexadecimal digit represents four binary digits (bits). A byte can be expressed with two hex digits, and a single hex digit (half a byte) is called a "nybble". Several notations have been used to express hexadecimal numbers and distinguish them from other bases, including the C notation of preceding the number with 0x (e.g., 0xABCD), the notation common on some early personal computers of using the dollar sign ($) before a hex number, and yet another notation of following the number with "h".

Little-Endian: The system of storing numeric values which take up more than one byte in a manner in which the low-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "smaller-valued" parts of the number come in the first byte or the last one. See Endianness.

Octal: Numbers expressed in base 8, using only the digits 0 through 7. Along with hexadecimal, this is a base often used by programmers, being a power of 2 and hence easy to convert from or to binary; in this case, it represents three bits. Some things, including Unix file permission levels, are commonly expressed in octal digits, but hexadecimal is better-suited to many other applications due to it fitting evenly within an 8-bit byte. In standard C notation, octal numbers are preceded by a leading zero.

Trinary: Numbers expressed in base 3, using only the digits 0 through 2. This is not very commonly used in computing, since 3 is not a power of 2, but an experimental means of encoding data in DNA makes use of it, calling the individual digits "trits" analogously to "bits" being binary digits.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox