UTF-8
From Just Solve the File Format Problem
(Difference between revisions)
(infobox) |
(Add details) |
||
Line 3: | Line 3: | ||
|subcat=Character Encodings | |subcat=Character Encodings | ||
}} | }} | ||
− | '''[[UCS]] Transformation Format—8-bit''' (UTF-8) is a [[Unicode]] character encoding. | + | '''[[UCS]] Transformation Format—8-bit''' (UTF-8) is a [[Unicode]] character encoding. Codes 0-127 (0-7F hexadecimal) represent the equivalent [[ASCII]] characters, and these codes in a UTF-8 stream are never used in any other context. Codes FE and FF are never used, except in the optional Byte Order Mark at the beginning of a document. In UTF-8 the BOM is encoded as the bytes 0xEF, 0xBB, 0xBF. Since UTF-8 has no "endianness," this is not actually a byte order indicator but can be treated as a signature indicating the document is UTF-8 encoded. |
+ | |||
+ | UTF-8 is best suited for scripts that make heavy use of the Roman alphabet. With other scripts it may not provide as efficient an encoding as [[UTF-16]] or [[UTF-32]]. | ||
== Specifications == | == Specifications == | ||
Line 17: | Line 19: | ||
* [http://doc.cat-v.org/plan_9/4th_edition/papers/utf Hello World or Καλημέρα κόσμε or こんにちは 世界] ([http://plan9.bell-labs.com/sys/doc/utf.pdf PDF]) | * [http://doc.cat-v.org/plan_9/4th_edition/papers/utf Hello World or Καλημέρα κόσμε or こんにちは 世界] ([http://plan9.bell-labs.com/sys/doc/utf.pdf PDF]) | ||
+ | |||
+ | [[Category:Text encoding]] |
Revision as of 06:05, 10 November 2012
UCS Transformation Format—8-bit (UTF-8) is a Unicode character encoding. Codes 0-127 (0-7F hexadecimal) represent the equivalent ASCII characters, and these codes in a UTF-8 stream are never used in any other context. Codes FE and FF are never used, except in the optional Byte Order Mark at the beginning of a document. In UTF-8 the BOM is encoded as the bytes 0xEF, 0xBB, 0xBF. Since UTF-8 has no "endianness," this is not actually a byte order indicator but can be treated as a signature indicating the document is UTF-8 encoded.
UTF-8 is best suited for scripts that make heavy use of the Roman alphabet. With other scripts it may not provide as efficient an encoding as UTF-16 or UTF-32.
Specifications
- STD 63
- Unicode 6.0, Chapter 3 (2011) – §3.9 D92, §3.10 D95
- ISO/IEC 10646:2003 Annex D (2003)