Glossary

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
 
Some of the terms that might be encountered in descriptions of file formats:
 
Some of the terms that might be encountered in descriptions of file formats:
 +
 +
'''Analog''': A continuously-variable signal, as opposed to a digital representation which divides the data into little pieces that can be represented numerically, such as pixels in an image. An analog signal (for instance, the music encoded in the groove of a [[Gramophone record|record]] or the image on [[photographic film]]), has no "sampling rate" and can sometimes be analyzed down to higher resolutions than the original playback devices may have supported, though you eventually reach the limitations of the physical media involved. (''Analog'' is also a science fiction magazine, formerly named ''Astounding''.)
 +
 +
'''Baud''': Sometimes treated as a synonym for "bits per second" in transfer protocols, but not actually synonymous; it refers to the number of signal changes per second, e.g., in a modem. Early modems transferred one bit per signal change, so a 300-baud modem got 300 bits per second, but later modems used more sophisticated protocols that transferred multiple bytes per signal change by using more distinct types of signals, so "bps" (or "kbps", "mbps", etc.) is the more appropriate term to use when discussing the amount of data a device, network, or protocol can transfer.
  
 
'''Big-Endian''': The system of storing numeric values which take up more than one byte in a manner in which the high-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "larger-valued" parts of the number come in the first byte or the last one. See [[Endianness]].
 
'''Big-Endian''': The system of storing numeric values which take up more than one byte in a manner in which the high-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "larger-valued" parts of the number come in the first byte or the last one. See [[Endianness]].
Line 8: Line 12:
  
 
'''Byte''': A group of eight bits, sometimes also referred to as an "octet". This is how computer memory is traditionally organized. Usually a byte is treated as a unit, representing a number from 0 to 255 or else a text character in an encoding such as [[ASCII]], but some file formats delve into the individual bits; the 8 bits which make up a byte are arranged from the "high-order bit" to the "low-order bit" based on where they fall in the binary number represented by them; as with conventional decimal numbers, the leftmost one has the highest value and is the "high-order bit". (How they're physically arranged on the storage medium depends on the characteristics of the specific device. This is usually not of concern to programmers of anything above low-level device drivers and processor microcode; normal developers see only the abstract logical structure of the bits and bytes.) In raw memory dumps, a byte will often be displayed as two hexadecimal digits.  (''Byte'' is also a computer magazine published since the 1970s.)
 
'''Byte''': A group of eight bits, sometimes also referred to as an "octet". This is how computer memory is traditionally organized. Usually a byte is treated as a unit, representing a number from 0 to 255 or else a text character in an encoding such as [[ASCII]], but some file formats delve into the individual bits; the 8 bits which make up a byte are arranged from the "high-order bit" to the "low-order bit" based on where they fall in the binary number represented by them; as with conventional decimal numbers, the leftmost one has the highest value and is the "high-order bit". (How they're physically arranged on the storage medium depends on the characteristics of the specific device. This is usually not of concern to programmers of anything above low-level device drivers and processor microcode; normal developers see only the abstract logical structure of the bits and bytes.) In raw memory dumps, a byte will often be displayed as two hexadecimal digits.  (''Byte'' is also a computer magazine published since the 1970s.)
 +
 +
'''Digital''': The inverse of "analog", meaning a set of data which has been digitized (if it originated in an analog medium), or else created natively on a digital device such as a computer. There are no continuously-variable quantities in a digital data set, only a set of discrete elements which can be converted into a series of bits for storage. Images, for instance, are broken up into pixels at some stated resolution, each of which can have one of a finite set of color values.  Sounds are sampled at some sampling rate to capture the state of the sound wave at that point.
  
 
'''Hexadecimal''' (or "hex"): Numbers expressed in base 16. This works similarly to the base-10 decimal system usually used by humans (probably because we have ten fingers, if you include the thumbs), but with some extra digits to make up 16 digits in all, which comprise the numbers 0 through 9, plus the letters A through F (representing values of 10 through 15). Programmers use hexadecimal often, since 16 is a power of 2 and hence it is easier to translate between binary and hexadecimal than it is to get to and from decimal; each hexadecimal digit represents four binary digits (bits). A byte can be expressed with two hex digits, and a single hex digit (half a byte) is called a "nybble". Several notations have been used to express hexadecimal numbers and distinguish them from other bases, including the [[C]] notation of preceding the number with '''0x''' (e.g., '''0xABCD'''), the notation common on some early personal computers of using the dollar sign ($) before a hex number, and yet another notation of following the number with "h".
 
'''Hexadecimal''' (or "hex"): Numbers expressed in base 16. This works similarly to the base-10 decimal system usually used by humans (probably because we have ten fingers, if you include the thumbs), but with some extra digits to make up 16 digits in all, which comprise the numbers 0 through 9, plus the letters A through F (representing values of 10 through 15). Programmers use hexadecimal often, since 16 is a power of 2 and hence it is easier to translate between binary and hexadecimal than it is to get to and from decimal; each hexadecimal digit represents four binary digits (bits). A byte can be expressed with two hex digits, and a single hex digit (half a byte) is called a "nybble". Several notations have been used to express hexadecimal numbers and distinguish them from other bases, including the [[C]] notation of preceding the number with '''0x''' (e.g., '''0xABCD'''), the notation common on some early personal computers of using the dollar sign ($) before a hex number, and yet another notation of following the number with "h".
 +
 +
'''Kilobit''': 1000 (or 1024) bits, or 1/8 of a kilobyte. Transfer protocols usually measure their speed in bits per second (or multiples thereof) rather than bytes, so you have to divide by 8 to get the number of bytes (or kilobytes, etc.) transferred in a second.
 +
 +
'''Kilobyte''': Either 1000 bytes (the literal meaning of the metric prefix "kilo") or, more often, 1024 bytes (a power of two, which makes it a "round number" to a computer). Attempts to resolve the ambiguity by introducing a new term "kibibyte" for 1024 bytes to leave "kilobyte" meaning 1000 haven't gone anywhere.
  
 
'''Little-Endian''': The system of storing numeric values which take up more than one byte in a manner in which the low-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "smaller-valued" parts of the number come in the first byte or the last one. See [[Endianness]].
 
'''Little-Endian''': The system of storing numeric values which take up more than one byte in a manner in which the low-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "smaller-valued" parts of the number come in the first byte or the last one. See [[Endianness]].
 +
 +
'''Megabyte''': 1024 (or sometimes 1000) kilobytes, or 1,048,576 (or 1,000,000) bytes. The ambiguity is between the use of strict decimal multiples (in keeping with the normal meaning of the metric prefixes), which comes more naturally to humans, or the powers-of-2-based multiples which come more naturally to computers.
  
 
'''Octal''': Numbers expressed in base 8, using only the digits 0 through 7. Along with hexadecimal, this is a base often used by programmers, being a power of 2 and hence easy to convert from or to binary; in this case, it represents three bits. Some things, including Unix file permission levels, are commonly expressed in octal digits, but hexadecimal is better-suited to many other applications due to it fitting evenly within an 8-bit byte. In standard [[C]] notation, octal numbers are preceded by a leading zero.
 
'''Octal''': Numbers expressed in base 8, using only the digits 0 through 7. Along with hexadecimal, this is a base often used by programmers, being a power of 2 and hence easy to convert from or to binary; in this case, it represents three bits. Some things, including Unix file permission levels, are commonly expressed in octal digits, but hexadecimal is better-suited to many other applications due to it fitting evenly within an 8-bit byte. In standard [[C]] notation, octal numbers are preceded by a leading zero.
 +
 +
'''Pixel''': One "picture element", a part of a graphic image as stored on a computer. If the image is 640 x 480, it consists of a matrix of pixels 640 wide and 480 high, for a total of 307,200 pixels. If each pixel can be one of 256 colors, this image can be stored in that number of bytes (or fewer if [[compression]] is applied), but most computer image formats these days have a larger color palette, thus requiring more bytes of storage.
  
 
'''Trinary''': Numbers expressed in base 3, using only the digits 0 through 2. This is not very commonly used in computing, since 3 is not a power of 2, but an experimental means of encoding data in [[DNA]] makes use of it, calling the individual digits "trits" analogously to "bits" being binary digits.
 
'''Trinary''': Numbers expressed in base 3, using only the digits 0 through 2. This is not very commonly used in computing, since 3 is not a power of 2, but an experimental means of encoding data in [[DNA]] makes use of it, calling the individual digits "trits" analogously to "bits" being binary digits.

Revision as of 05:09, 25 January 2013

Some of the terms that might be encountered in descriptions of file formats:

Analog: A continuously-variable signal, as opposed to a digital representation which divides the data into little pieces that can be represented numerically, such as pixels in an image. An analog signal (for instance, the music encoded in the groove of a record or the image on photographic film), has no "sampling rate" and can sometimes be analyzed down to higher resolutions than the original playback devices may have supported, though you eventually reach the limitations of the physical media involved. (Analog is also a science fiction magazine, formerly named Astounding.)

Baud: Sometimes treated as a synonym for "bits per second" in transfer protocols, but not actually synonymous; it refers to the number of signal changes per second, e.g., in a modem. Early modems transferred one bit per signal change, so a 300-baud modem got 300 bits per second, but later modems used more sophisticated protocols that transferred multiple bytes per signal change by using more distinct types of signals, so "bps" (or "kbps", "mbps", etc.) is the more appropriate term to use when discussing the amount of data a device, network, or protocol can transfer.

Big-Endian: The system of storing numeric values which take up more than one byte in a manner in which the high-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "larger-valued" parts of the number come in the first byte or the last one. See Endianness.

Binary: Base 2 numbers, consisting entirely of the digits 1 and 0. These are very important in computing, where everything is stored on a digital computer in the form of a series of binary digits, or bits. While all data formats are "binary" in this sense, usually "binary file format" is used to refer to a method of storing data that is something other than plain text; it consists of raw numbers which don't look like anything meaningful when brought up in a text editor. Even raw binary data is rarely displayed as actual 1s and 0s; developers usually use more compact notations such as hexadecimal or octal.

Bit: A single binary digit. In a computer it is stored in something analogous to a light switch, which can be turned on or off, representing digits 1 and 0 respectively. (Of course, a bit in computer memory is much smaller than a light switch; millions of them fit on a computer chip or optical or magnetic storage medium.)

Byte: A group of eight bits, sometimes also referred to as an "octet". This is how computer memory is traditionally organized. Usually a byte is treated as a unit, representing a number from 0 to 255 or else a text character in an encoding such as ASCII, but some file formats delve into the individual bits; the 8 bits which make up a byte are arranged from the "high-order bit" to the "low-order bit" based on where they fall in the binary number represented by them; as with conventional decimal numbers, the leftmost one has the highest value and is the "high-order bit". (How they're physically arranged on the storage medium depends on the characteristics of the specific device. This is usually not of concern to programmers of anything above low-level device drivers and processor microcode; normal developers see only the abstract logical structure of the bits and bytes.) In raw memory dumps, a byte will often be displayed as two hexadecimal digits. (Byte is also a computer magazine published since the 1970s.)

Digital: The inverse of "analog", meaning a set of data which has been digitized (if it originated in an analog medium), or else created natively on a digital device such as a computer. There are no continuously-variable quantities in a digital data set, only a set of discrete elements which can be converted into a series of bits for storage. Images, for instance, are broken up into pixels at some stated resolution, each of which can have one of a finite set of color values. Sounds are sampled at some sampling rate to capture the state of the sound wave at that point.

Hexadecimal (or "hex"): Numbers expressed in base 16. This works similarly to the base-10 decimal system usually used by humans (probably because we have ten fingers, if you include the thumbs), but with some extra digits to make up 16 digits in all, which comprise the numbers 0 through 9, plus the letters A through F (representing values of 10 through 15). Programmers use hexadecimal often, since 16 is a power of 2 and hence it is easier to translate between binary and hexadecimal than it is to get to and from decimal; each hexadecimal digit represents four binary digits (bits). A byte can be expressed with two hex digits, and a single hex digit (half a byte) is called a "nybble". Several notations have been used to express hexadecimal numbers and distinguish them from other bases, including the C notation of preceding the number with 0x (e.g., 0xABCD), the notation common on some early personal computers of using the dollar sign ($) before a hex number, and yet another notation of following the number with "h".

Kilobit: 1000 (or 1024) bits, or 1/8 of a kilobyte. Transfer protocols usually measure their speed in bits per second (or multiples thereof) rather than bytes, so you have to divide by 8 to get the number of bytes (or kilobytes, etc.) transferred in a second.

Kilobyte: Either 1000 bytes (the literal meaning of the metric prefix "kilo") or, more often, 1024 bytes (a power of two, which makes it a "round number" to a computer). Attempts to resolve the ambiguity by introducing a new term "kibibyte" for 1024 bytes to leave "kilobyte" meaning 1000 haven't gone anywhere.

Little-Endian: The system of storing numeric values which take up more than one byte in a manner in which the low-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "smaller-valued" parts of the number come in the first byte or the last one. See Endianness.

Megabyte: 1024 (or sometimes 1000) kilobytes, or 1,048,576 (or 1,000,000) bytes. The ambiguity is between the use of strict decimal multiples (in keeping with the normal meaning of the metric prefixes), which comes more naturally to humans, or the powers-of-2-based multiples which come more naturally to computers.

Octal: Numbers expressed in base 8, using only the digits 0 through 7. Along with hexadecimal, this is a base often used by programmers, being a power of 2 and hence easy to convert from or to binary; in this case, it represents three bits. Some things, including Unix file permission levels, are commonly expressed in octal digits, but hexadecimal is better-suited to many other applications due to it fitting evenly within an 8-bit byte. In standard C notation, octal numbers are preceded by a leading zero.

Pixel: One "picture element", a part of a graphic image as stored on a computer. If the image is 640 x 480, it consists of a matrix of pixels 640 wide and 480 high, for a total of 307,200 pixels. If each pixel can be one of 256 colors, this image can be stored in that number of bytes (or fewer if compression is applied), but most computer image formats these days have a larger color palette, thus requiring more bytes of storage.

Trinary: Numbers expressed in base 3, using only the digits 0 through 2. This is not very commonly used in computing, since 3 is not a power of 2, but an experimental means of encoding data in DNA makes use of it, calling the individual digits "trits" analogously to "bits" being binary digits.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox