Glossary
Dan Tobias (Talk | contribs) |
|||
Line 1: | Line 1: | ||
Some of the terms that might be encountered in descriptions of file formats: | Some of the terms that might be encountered in descriptions of file formats: | ||
+ | |||
+ | '''Batch File:''' A script that contains commands and arguments to automate routine tasks. These commands and arguments can be run by typing them into the terminal or saved and run whenever needed saving time and effort. They can be edited if necessary to accomodate changes in the work environment they are used in. | ||
+ | These scripts are mini programs that do narrowly defined tasks and are written in a text editor. Generally having to do with file and disk management, they can do many other things as well. They can written to ask for user input and respond accordingly. Many network administration tasks can be automated with batch files. They can also be used to control the behaviour of other programs. | ||
+ | The commands and arguments in a batch file are executed in sequence. | ||
+ | Batch files are run by the command.com program, and use the file extension '''.bat''' and can only be run on Windows machines. | ||
+ | (''see '''Shell Scripts''' for similar files for Unix, Linux and Apple computers'') | ||
'''Analog''': A continuously-variable signal, as opposed to a digital representation which divides the data into little pieces that can be represented numerically, such as pixels in an image. An analog signal (for instance, the music encoded in the groove of a [[Gramophone record|record]] or the image on [[photographic film]]), has no "sampling rate" and can sometimes be analyzed down to higher resolutions than the original playback devices may have supported, though you eventually reach the limitations of the physical media involved. (''Analog'' is also a science fiction magazine, formerly named ''Astounding''.) | '''Analog''': A continuously-variable signal, as opposed to a digital representation which divides the data into little pieces that can be represented numerically, such as pixels in an image. An analog signal (for instance, the music encoded in the groove of a [[Gramophone record|record]] or the image on [[photographic film]]), has no "sampling rate" and can sometimes be analyzed down to higher resolutions than the original playback devices may have supported, though you eventually reach the limitations of the physical media involved. (''Analog'' is also a science fiction magazine, formerly named ''Astounding''.) | ||
Line 33: | Line 39: | ||
'''Pixel''': One "picture element", a part of a graphic image as stored on a computer. If the image is 640 x 480, it consists of a matrix of pixels 640 wide and 480 high, for a total of 307,200 pixels. If each pixel can be one of 256 colors, this image can be stored in that number of bytes (or fewer if [[compression]] is applied), but most computer image formats these days have a larger color palette, thus requiring more bytes of storage. | '''Pixel''': One "picture element", a part of a graphic image as stored on a computer. If the image is 640 x 480, it consists of a matrix of pixels 640 wide and 480 high, for a total of 307,200 pixels. If each pixel can be one of 256 colors, this image can be stored in that number of bytes (or fewer if [[compression]] is applied), but most computer image formats these days have a larger color palette, thus requiring more bytes of storage. | ||
+ | '''Shell Scripts:''' | ||
+ | Linux, Unix and Apple systems have scripting schemes that do the same jobs that batch files ('''.bat''') do in Windows machines, and run are run from their terminals. They are called shell scripts and in Linux and unix typically use the file extension '''.sh''' although there are others sometimes used. For Apple computers the file extension is '''.scpt'''. These scripts can have variables and flow control statements (e.g. goto, if-then-else, while, for,) | ||
+ | Like their counterparts in Windows, these scripts are executable, and can be saved to automate tasks and be edited as needed. The file extension in Linux and Unix is not really required, what makes them what they are is controlled by the first line of the script itself and it needs to be made executable by the owner of the file. (''see '''Batch Files''' for Windows equivalent'') | ||
+ | |||
'''Terabyte''': 1024 (or 1000) gigabytes. A growing number of inexpensive storage units now support this much storage. | '''Terabyte''': 1024 (or 1000) gigabytes. A growing number of inexpensive storage units now support this much storage. | ||
'''Trinary''': Numbers expressed in base 3, using only the digits 0 through 2. This is not very commonly used in computing, since 3 is not a power of 2, but an experimental means of encoding data in [[DNA]] makes use of it, calling the individual digits "trits" analogously to "bits" being binary digits. | '''Trinary''': Numbers expressed in base 3, using only the digits 0 through 2. This is not very commonly used in computing, since 3 is not a power of 2, but an experimental means of encoding data in [[DNA]] makes use of it, calling the individual digits "trits" analogously to "bits" being binary digits. |
Revision as of 05:38, 25 January 2013
Some of the terms that might be encountered in descriptions of file formats:
Batch File: A script that contains commands and arguments to automate routine tasks. These commands and arguments can be run by typing them into the terminal or saved and run whenever needed saving time and effort. They can be edited if necessary to accomodate changes in the work environment they are used in. These scripts are mini programs that do narrowly defined tasks and are written in a text editor. Generally having to do with file and disk management, they can do many other things as well. They can written to ask for user input and respond accordingly. Many network administration tasks can be automated with batch files. They can also be used to control the behaviour of other programs. The commands and arguments in a batch file are executed in sequence. Batch files are run by the command.com program, and use the file extension .bat and can only be run on Windows machines. (see Shell Scripts for similar files for Unix, Linux and Apple computers)
Analog: A continuously-variable signal, as opposed to a digital representation which divides the data into little pieces that can be represented numerically, such as pixels in an image. An analog signal (for instance, the music encoded in the groove of a record or the image on photographic film), has no "sampling rate" and can sometimes be analyzed down to higher resolutions than the original playback devices may have supported, though you eventually reach the limitations of the physical media involved. (Analog is also a science fiction magazine, formerly named Astounding.)
Baud: Sometimes treated as a synonym for "bits per second" in transfer protocols, but not actually synonymous; it refers to the number of signal changes per second, e.g., in a modem. Early modems transferred one bit per signal change, so a 300-baud modem got 300 bits per second, but later modems used more sophisticated protocols that transferred multiple bytes per signal change by using more distinct types of signals, so "bps" (or "kbps", "mbps", etc.) is the more appropriate term to use when discussing the amount of data a device, network, or protocol can transfer.
Big-Endian: The system of storing numeric values which take up more than one byte in a manner in which the high-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "larger-valued" parts of the number come in the first byte or the last one. See Endianness.
Binary: Base 2 numbers, consisting entirely of the digits 1 and 0. These are very important in computing, where everything is stored on a digital computer in the form of a series of binary digits, or bits. While all data formats are "binary" in this sense, usually "binary file format" is used to refer to a method of storing data that is something other than plain text; it consists of raw numbers which don't look like anything meaningful when brought up in a text editor. Even raw binary data is rarely displayed as actual 1s and 0s; developers usually use more compact notations such as hexadecimal or octal.
Bit: A single binary digit. In a computer it is stored in something analogous to a light switch, which can be turned on or off, representing digits 1 and 0 respectively. (Of course, a bit in computer memory is much smaller than a light switch; millions of them fit on a computer chip or optical or magnetic storage medium.)
Byte: A group of eight bits, sometimes also referred to as an "octet". This is how computer memory is traditionally organized. Usually a byte is treated as a unit, representing a number from 0 to 255 or else a text character in an encoding such as ASCII, but some file formats delve into the individual bits; the 8 bits which make up a byte are arranged from the "high-order bit" to the "low-order bit" based on where they fall in the binary number represented by them; as with conventional decimal numbers, the leftmost one has the highest value and is the "high-order bit". (How they're physically arranged on the storage medium depends on the characteristics of the specific device. This is usually not of concern to programmers of anything above low-level device drivers and processor microcode; normal developers see only the abstract logical structure of the bits and bytes.) In raw memory dumps, a byte will often be displayed as two hexadecimal digits. (Byte is also a computer magazine published since the 1970s.)
Digital: The inverse of "analog", meaning a set of data which has been digitized (if it originated in an analog medium), or else created natively on a digital device such as a computer. There are no continuously-variable quantities in a digital data set, only a set of discrete elements which can be converted into a series of bits for storage. Images, for instance, are broken up into pixels at some stated resolution, each of which can have one of a finite set of color values. Sounds are sampled at some sampling rate to capture the state of the sound wave at that point.
Gigabyte: 1024 (or 1000) megabytes. Once an exotic term in the days when computer memory and disk space was typically measured in kilobytes, it is now commonplace even in measuring the size of small and cheap thumb drives.
Hexadecimal (or "hex"): Numbers expressed in base 16. This works similarly to the base-10 decimal system usually used by humans (probably because we have ten fingers, if you include the thumbs), but with some extra digits to make up 16 digits in all, which comprise the numbers 0 through 9, plus the letters A through F (representing values of 10 through 15). Programmers use hexadecimal often, since 16 is a power of 2 and hence it is easier to translate between binary and hexadecimal than it is to get to and from decimal; each hexadecimal digit represents four binary digits (bits). A byte can be expressed with two hex digits, and a single hex digit (half a byte) is called a "nybble". Several notations have been used to express hexadecimal numbers and distinguish them from other bases, including the C notation of preceding the number with 0x (e.g., 0xABCD), the notation common on some early personal computers of using the dollar sign ($) before a hex number, and yet another notation of following the number with "h".
Kilobit: 1000 (or 1024) bits, or 1/8 of a kilobyte. Transfer protocols usually measure their speed in bits per second (or multiples thereof) rather than bytes, so you have to divide by 8 to get the number of bytes (or kilobytes, etc.) transferred in a second.
Kilobyte: Either 1000 bytes (the literal meaning of the metric prefix "kilo") or, more often, 1024 bytes (a power of two, which makes it a "round number" to a computer). Attempts to resolve the ambiguity by introducing a new term "kibibyte" for 1024 bytes to leave "kilobyte" meaning 1000 haven't gone anywhere.
Little-Endian: The system of storing numeric values which take up more than one byte in a manner in which the low-order byte comes first. If a number takes up more than 8 bits to store (e.g., an integer larger than 255), it must be divided between bytes, and it becomes an issue in file format definitions whether the "smaller-valued" parts of the number come in the first byte or the last one. See Endianness.
Megabyte: 1024 (or sometimes 1000) kilobytes, or 1,048,576 (or 1,000,000) bytes. The ambiguity is between the use of strict decimal multiples (in keeping with the normal meaning of the metric prefixes), which comes more naturally to humans, or the powers-of-2-based multiples which come more naturally to computers.
Octal: Numbers expressed in base 8, using only the digits 0 through 7. Along with hexadecimal, this is a base often used by programmers, being a power of 2 and hence easy to convert from or to binary; in this case, it represents three bits. Some things, including Unix file permission levels, are commonly expressed in octal digits, but hexadecimal is better-suited to many other applications due to it fitting evenly within an 8-bit byte. In standard C notation, octal numbers are preceded by a leading zero.
Petabyte: 1024 (or 1000) terabytes. For now, this much storage still requires a large array of disk drives or other storage units, but if things keep going the way they've been for the last few decades, it wouldn't be surprising if you could carry this much storage in your pocket soon.
Pixel: One "picture element", a part of a graphic image as stored on a computer. If the image is 640 x 480, it consists of a matrix of pixels 640 wide and 480 high, for a total of 307,200 pixels. If each pixel can be one of 256 colors, this image can be stored in that number of bytes (or fewer if compression is applied), but most computer image formats these days have a larger color palette, thus requiring more bytes of storage.
Shell Scripts: Linux, Unix and Apple systems have scripting schemes that do the same jobs that batch files (.bat) do in Windows machines, and run are run from their terminals. They are called shell scripts and in Linux and unix typically use the file extension .sh although there are others sometimes used. For Apple computers the file extension is .scpt. These scripts can have variables and flow control statements (e.g. goto, if-then-else, while, for,) Like their counterparts in Windows, these scripts are executable, and can be saved to automate tasks and be edited as needed. The file extension in Linux and Unix is not really required, what makes them what they are is controlled by the first line of the script itself and it needs to be made executable by the owner of the file. (see Batch Files for Windows equivalent)
Terabyte: 1024 (or 1000) gigabytes. A growing number of inexpensive storage units now support this much storage.
Trinary: Numbers expressed in base 3, using only the digits 0 through 2. This is not very commonly used in computing, since 3 is not a power of 2, but an experimental means of encoding data in DNA makes use of it, calling the individual digits "trits" analogously to "bits" being binary digits.