CESU-8

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
m
Line 2: Line 2:
 
|formattype=electronic
 
|formattype=electronic
 
|subcat=Character encoding
 
|subcat=Character encoding
 +
|charset=CESU-8
 +
|charsetaliases=csCESU8, csCESU-8
 +
|mibenum=1016
 
}}
 
}}
 
'''CESU-8''' is an inefficient [[Unicode]] [[Character Encodings|character encoding]] related to [[UTF-8]]. It is not an accepted standard, but has been documented in the interest of practicality.
 
'''CESU-8''' is an inefficient [[Unicode]] [[Character Encodings|character encoding]] related to [[UTF-8]]. It is not an accepted standard, but has been documented in the interest of practicality.

Revision as of 15:37, 19 May 2019

File Format
Name CESU-8
Ontology
IANA charset CESU-8
IANA aliases csCESU8, csCESU-8
IANA MIBenum 1016

CESU-8 is an inefficient Unicode character encoding related to UTF-8. It is not an accepted standard, but has been documented in the interest of practicality.

It's what you get if you take UTF-16 data, reinterpret it as UCS-2, then convert it to UTF-8 (while ignoring any rules forbidding the use of code points in the range U+D800 to U+DFFF). A code point thus uses 1, 2, 3, or 6 bytes.

It is sometimes used by accident, but may be used deliberately to accommodate systems that don't support 4-byte UTF-8 sequences, or when a close correspondence between UTF-16 and a UTF-8-like encoding is deemed necessary.

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox