Character encoding

File Format
Name	Character encoding
Ontology	Electronic File Formats Character Encodings ; ;

Revision as of 05:24, 5 December 2013

See Fonts for their renditions as seen on screens and printouts.

Adobe Standard Encoding
Amstrad CP/M Plus character set
ANSEL
- MARC-8
APL code page
ARMSCII
ASCII
ATASCII (used by Atari computers)
Baudot code
Braille
- BRF
- Nemeth Code
- Taylor Code
Compucolor character set
EBCDIC
- CP037
- CP285
- CP424
- CP500
- CP875
- CP1026
- CP1047
- CP1140
- CP1148
- CP1155
- CP4971
- CP9067
- CP12712
- EBCDIC 6-Bit
GB 2312
IBM PC code pages
ISO 646
- ISO 646-CA (Canada / French)
- ISO 646-CA-2 (Canada / French)
- ISO 646-CH (Switzerland)
- ISO 646-CN (China / Basic Latin)
- ISO 646-CU (Cuba / Spanish)
- ISO 646-DE (Germany)
- ISO 646-DK (Denmark)
- ISO 646-FI (Finland)
- ISO 646-FR (France)
- ISO 646-GB (Great Britain)
- ISO 646-HU (Hungary)
- ISO 646-IRV (International Reference Version)
- ISO 646-IT (Italy)
- ISO 646-JP (Japan / Romaji)
- ISO 646-JP OCR-B (Japan / Romaji)
- ISO 646-KR (Korea / Latin)
- ISO 646-MT (Malta)
- ISO 646-NL (Netherlands)
- ISO 646-NO (Norway)
- ISO 646-NO-2 (Norway)
- ISO 646-PT (Portugal)
- ISO 646-SE (Sweden)
- ISO 646-SE-2 (Sweden)
- ISO 646-US (Same as ASCII)
- ISO 646-YU (Yugoslavia)
ISO 2022
ISO 8859
- ISO 8859-1 (Latin-1)
- ISO 8859-2 (Latin-2, Central/East European)
- ISO 8859-3 (Latin-3, Esperanto, Galician, Maltese, and Turkish)
- ISO 8859-4 (Latin-4, Scandinavian and Baltic)
- ISO 8859-5 (Cyrillic)
- ISO 8859-6 (Arabic)
- ISO 8859-7 (Modern Greek)
- ISO 8859-8 (Hebrew)
- ISO 8859-9 (Latin-5, Turkish)
- ISO 8859-10 (Latin-6, Lappish, Nordic, and Inuit)
- ISO 8859-11 (Thai)
- ISO 8859-13 (Latin-7, Baltic Rim)
- ISO 8859-14 (Celtic)
- ISO 8859-15 (Latin-9, Latin-1 with a Euro sign)
- ISO 8859-16 (Romanian)
JIS
- JIS X 0201
- JIS X 0208
- Shift-JIS
KOI8
- KOI8-CS (Czechoslovakia)
- KOI8-R (Russia)
- KOI8-U (Ukraine)
Macintosh encodings
- MacCE
- MacCyrillic
- MacDingbat
- MacGreek
- MacGujarati
- MacGurmukhi
- MacIceland
- MacRoman
- MacRomania
- MacSymbol
- MacThai
- MacTurkish
- MacUkraine
Morse code
MS-DOS encodings
- MS-DOS Latin US
- MS-DOS Greek
- MS-DOS Baltic Rim
- MS-DOS Latin-1
- MS-DOS Greek 1
- MS-DOS Latin-2
- MS-DOS Cyrillic
- MS-DOS Turkish
- MS-DOS Portuguese
- MS-DOS Icelandic
- MS-DOS Hebrew
- MS-DOS French Canada
- MS-DOS Arabic
- MS-DOS Nordic
- MS-DOS Cyrillic CIS 1
- MS-DOS Greek 2
PETSCII (or PET ASCII or CBM ASCII; used by Commodore computers)
Unicode
- UTF-1
- UTF-7
- UTF-8
- CESU-8
- UTF-EBCDIC
- UTF-9
- UTF-16
- UCS-2
- UTF-18
- UTF-32 (UCS-4)
- GB18030
- Punycode
VISCII
Windows encodings
- Windows 1252 (ISO 8859-1 plus additional characters)
- Windows 1255 (Hebrew)
- Windows 1256 (Arabic, Farsi, Urdu)
- Windows 1257 (Baltic Rim)
- Windows 1258 (Vietnamese)

Format details

Byte Order Mark
C0 controls (ASCII control characters, 7 bit)
C1 controls (extended control characters, 8 bit)

Character escape codes

(used to enter characters in various systems and formats)

Alt codes (DOS/Windows)
Backslash escapes (used in various programming and markup languages)
HTML character references (entities and numeric values)

Tools

Kreative Recode: software to convert character encodings

Commentary and satire

References

Ken Lunde, CJKV Information Processing, O'Reilly 2008, ISBN 978-0-596-51447-1 (has lots of information on encodings and Unicode in general, not only for CJKV locales)
IBM 3270 character set reference (1987)

@@ Line 163: / Line 163: @@
 * [http://geoff.greer.fm/2012/08/12/character-encoding-bugs-are-%F0%9D%92%9Cwesome/ Character encoding bugs are 𝒜wesome!]
 * [http://xkcd.com/1209/ xkcd: Encoding]
+* [http://www.collegehumor.com/article/6872071/8-new-and-necessary-punctuation-marks 8 New Punctuation Marks We Desperately Need]
 == Other external links ==

Character encoding

Revision as of 05:24, 5 December 2013

Contents

Format details

Character escape codes

Tools

Commentary and satire

Other external links

References

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox