Byte Order Mark
|  (initial page) | Dan Tobias  (Talk | contribs)   (Put the more detailed description here where it belongs instead of just linking to the CSV article.) | ||
| Line 1: | Line 1: | ||
| ==Introduction== | ==Introduction== | ||
| − | + | ||
| − | + | The '''Byte Order Mark''' (BOM) is a header that is added sometimes to some type of textual formats such as [[CSV]] to have applications recognize the right character encoding. It was designed to deal with the "big-endian vs. little-endian" problem of expressing multi-byte numeric data, where some systems put the highest-order byte first and others put it last. This affects 16-bit character encodings. The BOM has been allocated a character position in the [[Unicode]] character set, where the corresponding character with the two bytes of the 16-bit code point are reversed is reserved and guaranteed against having a different meaning allocated by Unicode. This means that if the reversed version is encountered, the file is known to be the opposite byte order than was previously assumed. | |
| − | + | ||
| + | Some [[UTF8]] files (including [[CSV]] files) are written with a prepending BOM consisting of 3 bytes: <code>EF BB BF</code>.  | ||
| + | |||
| + | "Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature." [http://en.wikipedia.org/wiki/Byte-order_mark#cite_note-2] | ||
| + | |||
| + | == References == | ||
| + | * [http://en.wikipedia.org/wiki/Byte-order_mark Byte-order mark (Wikipedia)] | ||
| [[Category:File format details]] | [[Category:File format details]] | ||
Revision as of 16:16, 22 November 2012
Introduction
The Byte Order Mark (BOM) is a header that is added sometimes to some type of textual formats such as CSV to have applications recognize the right character encoding. It was designed to deal with the "big-endian vs. little-endian" problem of expressing multi-byte numeric data, where some systems put the highest-order byte first and others put it last. This affects 16-bit character encodings. The BOM has been allocated a character position in the Unicode character set, where the corresponding character with the two bytes of the 16-bit code point are reversed is reserved and guaranteed against having a different meaning allocated by Unicode. This means that if the reversed version is encountered, the file is known to be the opposite byte order than was previously assumed.
Some UTF8 files (including CSV files) are written with a prepending BOM consisting of 3 bytes: EF BB BF. 
"Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature." [1]

