CESU-8
From Just Solve the File Format Problem
(Difference between revisions)
(Created page with "{{FormatInfo |formattype=electronic |subcat=Character Encodings }} '''CESU-8''' is an inefficient Unicode character encoding related to UTF-8....") |
Dan Tobias (Talk | contribs) |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{FormatInfo | {{FormatInfo | ||
|formattype=electronic | |formattype=electronic | ||
− | |subcat=Character | + | |subcat=Character encoding |
+ | |subcat2=Unicode | ||
+ | |charset=CESU-8 | ||
+ | |charsetaliases=csCESU8, csCESU-8 | ||
+ | |mibenum=1016 | ||
}} | }} | ||
− | |||
'''CESU-8''' is an inefficient [[Unicode]] [[Character Encodings|character encoding]] related to [[UTF-8]]. It is not an accepted standard, but has been documented in the interest of practicality. | '''CESU-8''' is an inefficient [[Unicode]] [[Character Encodings|character encoding]] related to [[UTF-8]]. It is not an accepted standard, but has been documented in the interest of practicality. | ||
Latest revision as of 02:35, 21 May 2019
CESU-8 is an inefficient Unicode character encoding related to UTF-8. It is not an accepted standard, but has been documented in the interest of practicality.
It's what you get if you take UTF-16 data, reinterpret it as UCS-2, then convert it to UTF-8 (while ignoring any rules forbidding the use of code points in the range U+D800 to U+DFFF). A code point thus uses 1, 2, 3, or 6 bytes.
It is sometimes used by accident, but may be used deliberately to accommodate systems that don't support 4-byte UTF-8 sequences, or when a close correspondence between UTF-16 and a UTF-8-like encoding is deemed necessary.