UTF-32

File Format
Name	UTF-32
Ontology	Electronic File Formats Character encoding Unicode UTF-32 ; ; ; ;
IANA charset	UTF-32
IANA aliases	csUTF32
IANA MIBenum	1017

Latest revision as of 02:37, 21 May 2019

UCS Transformation Format—32-bit (UTF-32) is the trivial 32-bit Unicode character encoding. It encodes a sequence of Unicode code points in a sequence of 32-bit integers. There is a one-to-one mapping of Unicode code points to 32-bit values, so all characters require the same number of bits. Since the largest code points can be expressed in only 21 bits, this encoding is inherently wasteful of space; UTF-8 or UTF-16 is a more efficient coding in most cases. UTF-32 does provide computational simplicity and is more often used for in-memory storage of characters than for stored documents.

UTF-32 is also known as UCS-4. There may be some subtle philosophical differences between the terms "UTF-32" and "UCS-4", but for all practical purposes they are synonyms.

As with UTF-16, this format exists in both big- and small-endian varieties; since the relevant units are 32-bit chunks (not pairs of 16-bit chunks as the longer sequences of UTF-16 are), the endianness is applied to the entire 32 bits (4 bytes), meaning that the Byte Order Mark (zero-width no-break space) U+FEFF is encoded as byte sequence 00 00 FE FF in the big-endian version and FF FE 00 00 in the little-endian one (with all four bytes reversed from one version to the other). IANA has assigned the charset identifiers UTF-32BE (MIBenum 1018) and UTF-32LE (MIBenum 1019) to these variants.

[edit] Links

@@ Line 1: / Line 1: @@
 {{FormatInfo
 |formattype=electronic
-|subcat=Character Encodings
+|subcat=Character encoding
+|subcat2=Unicode
+|charset=UTF-32
+|charsetaliases=csUTF32
+|mibenum=1017
 }}
-'''UCS Transformation Format—32-bit''' (UTF-32) is a [[Unicode]] character encoding. There is a one-to-one mapping of Unicode code points to 32-bit values, so all characters require the same number of bits. Since the largest code points can be expressed in only 21 bits, this encoding is inherently wasteful of space; [[UTF-8]] or [[UTF-16]] is a more efficient coding in most cases. UTF-32 does provide computational simplicity and is more often used for in-memory storage of characters than for stored documents.
+'''UCS Transformation Format—32-bit''' ('''UTF-32''') is the trivial 32-bit [[Unicode]] character encoding. It encodes a sequence of Unicode code points in a sequence of 32-bit integers. There is a one-to-one mapping of Unicode code points to 32-bit values, so all characters require the same number of bits. Since the largest code points can be expressed in only 21 bits, this encoding is inherently wasteful of space; [[UTF-8]] or [[UTF-16]] is a more efficient coding in most cases. UTF-32 does provide computational simplicity and is more often used for in-memory storage of characters than for stored documents.
-As with UTF-16, this format exists in both big- and small-[[Endianness|endian]] varieties; since the relevant units are 32-bit chunks (not pairs of 16-bit chunks as the longer sequences of UTF-16 are), the endianness is applied to the entire 32 bits (4 bytes), meaning that the [[Byte Order Mark]] (zero-width no-break space) U+FEFF is encoded as byte sequence 00 00 FE FF in the big-endian version and FF FE 00 00 in the little-endian one (with all four bytes reversed from one version to the other).
+UTF-32 is also known as '''UCS-4'''. There may be some subtle philosophical differences between the terms "UTF-32" and "UCS-4", but for all practical purposes they are synonyms.
+As with UTF-16, this format exists in both big- and small-[[Endianness|endian]] varieties; since the relevant units are 32-bit chunks (not pairs of 16-bit chunks as the longer sequences of UTF-16 are), the endianness is applied to the entire 32 bits (4 bytes), meaning that the [[Byte Order Mark]] (zero-width no-break space) U+FEFF is encoded as byte sequence 00 00 FE FF in the big-endian version and FF FE 00 00 in the little-endian one (with all four bytes reversed from one version to the other). IANA has assigned the charset identifiers UTF-32BE (MIBenum 1018) and UTF-32LE (MIBenum 1019) to these variants.
 == Links ==

UTF-32

Latest revision as of 02:37, 21 May 2019

[edit] Links

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox