Glossary

From Just Solve the File Format Problem
Jump to: navigation, search

Some of the terms that might be encountered in descriptions of file formats:

Analog
A continuously-variable signal, as opposed to a digital representation which divides the data into little pieces that can be represented numerically, such as pixels in an image. An analog signal (for instance, the music encoded in the groove of a record or the image on photographic film), has no "sampling rate" and can sometimes be analyzed down to higher resolutions than the original playback devices may have supported, though you eventually reach the limitations of the physical media involved. (Analog is also a science fiction magazine, formerly named Astounding.)
Batch File
A script (on PC/MS-DOS or Windows systems) that contains commands and arguments to automate routine tasks. These commands and arguments can be run by typing them into the terminal or saved and run whenever needed saving time and effort. They can be edited if necessary to accomodate changes in the work environment they are used in.
These scripts are mini programs that do narrowly defined tasks and are written in a text editor. Generally having to do with file and disk management, they can do many other things as well. They can written to ask for user input and respond accordingly. Many network administration tasks can be automated with batch files. They can also be used to control the behaviour of other programs.
The commands and arguments in a batch file are executed in sequence.
Batch files [1] are run by the command.com [2] program, and use the file extension .bat and can only be run on Windows or PC/MS-DOS machines. (Full article)
(see Shell Scripts for similar files for Unix, Linux and Apple computers)
Baud
Sometimes treated as a synonym for "bits per second" in transfer protocols, but not actually synonymous; it refers to the number of signal changes per second, e.g., in a modem. Early modems transferred one bit per signal change, so a 300-baud modem got 300 bits per second, but later modems used more sophisticated protocols that transferred multiple bytes per signal change by using more distinct types of signals, so "bps" (or "kbps", "mbps", etc.) is the more appropriate term to use when discussing the amount of data a device, network, or protocol can transfer.
Big-Endian
The system of storing numeric values which take up more than one byte in a manner in which the high-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "larger-valued" parts of the number come in the first byte or the last one. See Endianness.
Binary
Base 2 numbers, consisting entirely of the digits 1 and 0. These are very important in computing, where everything is stored on a digital computer in the form of a series of binary digits, or bits. While all data formats are "binary" in this sense, usually "binary file format" is used to refer to a method of storing data that is something other than plain text; it consists of raw numbers which don't look like anything meaningful when brought up in a text editor. Even raw binary data is rarely displayed as actual 1s and 0s; developers usually use more compact notations such as hexadecimal or octal.
Bit
A single binary digit. In a computer it is stored in something analogous to a light switch, which can be turned on or off, representing digits 1 and 0 respectively. (Of course, a bit in computer memory is much smaller than a light switch; millions of them fit on a computer chip or optical or magnetic storage medium.)
Bitmap
An image stored as a set of groups of bits representing the color value of each pixel of the image, which can be mapped onto a display medium (screen or printer) for output. A black-and-white bitmap needs only one bit per pixel, while color images need more bits depending on the number of different colors supported. The values used for each pixel might be numeric indices within some enumerated palette of colors, or direct renditions of the color (e.g., by "RGB" values specifying the amount of red, green, and blue making up the color).
Byte
A group of eight bits, sometimes also referred to as an "octet". This is how computer memory is traditionally organized. Usually a byte is treated as a unit, representing a number from 0 to 255 or else a text character in an encoding such as ASCII, but some file formats delve into the individual bits; the 8 bits which make up a byte are arranged from the "high-order bit" to the "low-order bit" (or vice versa; see Bit order) based on where they fall in the binary number represented by them; as with conventional decimal numbers, the leftmost one has the highest value and is the "high-order bit". (How they're physically arranged on the storage medium depends on the characteristics of the specific device. This is usually not of concern to programmers of anything above low-level device drivers and processor microcode; normal developers see only the abstract logical structure of the bits and bytes.) In raw memory dumps, a byte will often be displayed as two hexadecimal digits. (Byte is also a computer magazine published since the 1970s.)
Digital
The inverse of "analog", meaning a set of data which has been digitized (if it originated in an analog medium), or else created natively on a digital device such as a computer. There are no continuously-variable quantities in a digital data set, only a set of discrete elements which can be converted into a series of bits for storage. Images, for instance, are broken up into pixels at some stated resolution, each of which can have one of a finite set of color values. Sounds are sampled at some sampling rate to capture the state of the sound wave at that point.
Exabyte
1024 (or 1000) petabytes, or about a quintillion bytes. The world's total capacity to store information was estimated at 295 exabytes in 2007 (up from 2.6 exabytes in 1986).
Floating point
A type of numeric storage that allows fractional parts expressed as decimal places, and an exponent indicating where the decimal point is placed.
Forensics
The study of finding information for a case. This includes extracting and using iPhone photos for a murder case, and grabbing a suspected terrorist's laptop hard disc and using browsing history to ensure state security. Information and experience used in this field has the same goal as us. To decode data.
Gigabyte
1024 (or 1000) megabytes. Once an exotic term in the days when computer memory and disk space was typically measured in kilobytes, it is now commonplace even in measuring the size of small and cheap thumb drives.
Hard sector
On a disk or other storage medium, indicates that the position of sectors is marked physically, such as by punched holes, so the drive can find a particular sector without requiring software-based formatting to designate sector positions.
Hexadecimal (or "hex")
Numbers expressed in base 16. This works similarly to the base-10 decimal system usually used by humans (probably because we have ten fingers, if you include the thumbs), but with some extra digits to make up 16 digits in all, which comprise the numbers 0 through 9, plus the letters A through F (representing values of 10 through 15). Programmers use hexadecimal often, since 16 is a power of 2 and hence it is easier to translate between binary and hexadecimal than it is to get to and from decimal; each hexadecimal digit represents four binary digits (bits). A byte can be expressed with two hex digits, and a single hex digit (half a byte) is called a "nybble". Several notations have been used to express hexadecimal numbers and distinguish them from other bases, including the C notation of preceding the number with 0x (e.g., 0xABCD), the notation common on some early personal computers of using the dollar sign ($) before a hex number, and yet another notation of following the number with "h".
Integer
A number with no fractional part. An unsigned integer has a positive value (or zero); a signed integer can have a negative, zero, or positive value. The number of bytes used to store integers in binary form can vary by platform, but it is common for a normal integer to be two bytes (16 bits), and a long integer 4 bytes (32 bits).
Kilobaud
1000 (or 1024) baud. Not actually the same as "kilobit per second", as explained under Baud. (Also the name of a computer magazine back in the '70s and '80s.)
Kilobit
1000 (or 1024) bits, or 1/8 of a kilobyte. Transfer protocols usually measure their speed in bits per second (or multiples thereof) rather than bytes, so you have to divide by 8 to get the number of bytes (or kilobytes, etc.) transferred in a second.
Kilobyte
Either 1000 bytes (the literal meaning of the metric prefix "kilo") or, more often, 1024 bytes (a power of two, which makes it a "round number" to a computer). Attempts to resolve the ambiguity by introducing a new term "kibibyte" for 1024 bytes to leave "kilobyte" meaning 1000 haven't gone anywhere. The decimal use of SI prefixes for bytes is more common for measurement of the capacity of hard drives.
Little-Endian
The system of storing numeric values which take up more than one byte in a manner in which the low-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "smaller-valued" parts of the number come in the first byte or the last one. See Endianness.
Megabyte
1024 (or sometimes 1000) kilobytes, or 1,048,576 (or 1,000,000) bytes. The ambiguity is between the use of strict decimal multiples (in keeping with the normal meaning of the metric prefixes), which comes more naturally to humans, or the powers-of-2-based multiples which come more naturally to computers.
Nybble
Half a byte (4 bits), representable by a single hexadecimal digit.
Octal
Numbers expressed in base 8, using only the digits 0 through 7. Along with hexadecimal, this is a base often used by programmers, being a power of 2 and hence easy to convert from or to binary; in this case, it represents three bits. Some things, including Unix file permission levels, are commonly expressed in octal digits, but hexadecimal is better-suited to many other applications due to it fitting evenly within an 8-bit byte. In standard C notation, octal numbers are preceded by a leading zero.
Petabyte
1024 (or 1000) terabytes. For now, this much storage still requires a large array of disk drives or other storage units, but if things keep going the way they've been for the last few decades, it wouldn't be surprising if you could carry this much storage in your pocket soon.
Pixel
One "picture element", a part of a graphic image as stored on a computer. If the image is 640 x 480, it consists of a matrix of pixels 640 wide and 480 high, for a total of 307,200 pixels. If each pixel can be one of 256 colors, this image can be stored in that number of bytes (or fewer if compression is applied), but most computer image formats these days have a larger color palette, thus requiring more bytes of storage.
Qubit
A bit on a quantum computer, which can hold a value that is a supposition of different states, rather than just being only one value of 1 or 0 as in normal bits.
Raster
Another term for a bitmap image, consisting of a set of pixels in a rectangular grid. The name derives from old cathode-ray tube (CRT) monitors, which scanned from top to bottom in what was known as a "raster scan" from a Latin term for "rake".
Shell Scripts
Linux, Unix and Apple systems have scripting schemes that do the same jobs that batch files (.bat) do in Windows machines, and run are run from their terminals. They are called Shell Scripts [3] and in Linux and Unix typically use the file extension .sh although there are others sometimes used. For Apple computers the scripts are written in Applescript [4] and the file extension is .scpt. These scripts can have variables and flow control statements (e.g. goto, if-then-else, while, for,)
Like their counterparts in Windows, these scripts are executable, and can be saved to automate tasks and be edited as needed. The file extension in Linux and Unix is not really required, what makes them what they are is controlled by the first line of the script itself and it needs to be made executable by the owner of the file. (see Batch Files for Windows equivalent)
Soft sector
A disk or other storage medium that does not have hard sectors (marked physically such as with punched holes), so the sector positions need to be established by the software when a disk is formatted.
Sprite
A stored image or shape intended to be drawn on command at a specified position on a graphic screen. For instance, movable game elements may be stored as sprites to allow them to be rapidly shown and moved around. (Also a soft drink, and one of the several superhero names Kitty Pryde of the X-Men has gone by.)
Terabyte
1024 (or 1000) gigabytes. A growing number of inexpensive storage units now support this much storage.
Trinary
Numbers expressed in base 3, using only the digits 0 through 2. This is not very commonly used in computing, since 3 is not a power of 2, but an experimental means of encoding data in DNA makes use of it, calling the individual digits "trits" analogously to "bits" being binary digits.
Vector graphics
Graphics expressed as a group of vectors specifying coordinates of a starting point (or specifying that the vector starts at the point the previous vector ended), the vector direction and length (by an angle and length, or by specifying an endpoint), and the drawing status for the vector (e.g., move without drawing, draw a line of specified color, draw a stored shape, etc.). This is as opposed to raster graphics which specify each pixel's content. Vector graphics are rendered by following the instructions in the file as to what lines and shapes to draw.
Yottabyte
1024 (or 1000) zettabytes. No system has come near containing this much information yet, but supposedly the US government data snoops at the NSA are gearing up to store data on that scale, but this might be the average cellphone storage in 50 years.
Zettabyte
1024 (or 1000) exabytes. The entire amount of information on the World Wide Web has been estimated at half a zettabyte.

Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox