|Line 2:||Line 2:|
Latest revision as of 02:35, 21 May 2019
It's what you get if you take UTF-16 data, reinterpret it as UCS-2, then convert it to UTF-8 (while ignoring any rules forbidding the use of code points in the range U+D800 to U+DFFF). A code point thus uses 1, 2, 3, or 6 bytes.
It is sometimes used by accident, but may be used deliberately to accommodate systems that don't support 4-byte UTF-8 sequences, or when a close correspondence between UTF-16 and a UTF-8-like encoding is deemed necessary.