CESU-8

File Format
Name	CESU-8
Ontology	Electronic File Formats Character encoding Unicode CESU-8 ; ; ; ;
IANA charset	CESU-8
IANA aliases	csCESU8, csCESU-8
IANA MIBenum	1016

Latest revision as of 02:35, 21 May 2019

CESU-8 is an inefficient Unicode character encoding related to UTF-8. It is not an accepted standard, but has been documented in the interest of practicality.

It's what you get if you take UTF-16 data, reinterpret it as UCS-2, then convert it to UTF-8 (while ignoring any rules forbidding the use of code points in the range U+D800 to U+DFFF). A code point thus uses 1, 2, 3, or 6 bytes.

It is sometimes used by accident, but may be used deliberately to accommodate systems that don't support 4-byte UTF-8 sequences, or when a close correspondence between UTF-16 and a UTF-8-like encoding is deemed necessary.

[edit] References

Unicode Technical Report #26

@@ Line 2: / Line 2: @@
 |formattype=electronic
 |subcat=Character encoding
+|subcat2=Unicode
 |charset=CESU-8
 |charsetaliases=csCESU8, csCESU-8

CESU-8

Latest revision as of 02:35, 21 May 2019

[edit] References

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox