Character encoding
From Just Solve the File Format Problem
				
								
				(Difference between revisions)
				
																
				
				
								
				Dan Tobias  (Talk | contribs)  | 
			Dan Tobias  (Talk | contribs)   (→Commentary and satire)  | 
			||
| Line 163: | Line 163: | ||
* [http://geoff.greer.fm/2012/08/12/character-encoding-bugs-are-%F0%9D%92%9Cwesome/ Character encoding bugs are 𝒜wesome!]  | * [http://geoff.greer.fm/2012/08/12/character-encoding-bugs-are-%F0%9D%92%9Cwesome/ Character encoding bugs are 𝒜wesome!]  | ||
* [http://xkcd.com/1209/ xkcd: Encoding]  | * [http://xkcd.com/1209/ xkcd: Encoding]  | ||
| + | * [http://www.collegehumor.com/article/6872071/8-new-and-necessary-punctuation-marks 8 New Punctuation Marks We Desperately Need]  | ||
== Other external links ==  | == Other external links ==  | ||
Revision as of 05:24, 5 December 2013
See Fonts for their renditions as seen on screens and printouts.
- Adobe Standard Encoding
 - Amstrad CP/M Plus character set
 - ANSEL
 - APL code page
 - ARMSCII
 - ASCII
 - ATASCII (used by Atari computers)
 - Baudot code
 - Braille
 - Compucolor character set
 - EBCDIC
 - GB 2312
 - IBM PC code pages
 -  ISO 646
- ISO 646-CA (Canada / French)
 - ISO 646-CA-2 (Canada / French)
 - ISO 646-CH (Switzerland)
 - ISO 646-CN (China / Basic Latin)
 - ISO 646-CU (Cuba / Spanish)
 - ISO 646-DE (Germany)
 - ISO 646-DK (Denmark)
 - ISO 646-FI (Finland)
 - ISO 646-FR (France)
 - ISO 646-GB (Great Britain)
 - ISO 646-HU (Hungary)
 - ISO 646-IRV (International Reference Version)
 - ISO 646-IT (Italy)
 - ISO 646-JP (Japan / Romaji)
 - ISO 646-JP OCR-B (Japan / Romaji)
 - ISO 646-KR (Korea / Latin)
 - ISO 646-MT (Malta)
 - ISO 646-NL (Netherlands)
 - ISO 646-NO (Norway)
 - ISO 646-NO-2 (Norway)
 - ISO 646-PT (Portugal)
 - ISO 646-SE (Sweden)
 - ISO 646-SE-2 (Sweden)
 - ISO 646-US (Same as ASCII)
 - ISO 646-YU (Yugoslavia)
 
 - ISO 2022
 -  ISO 8859
- ISO 8859-1 (Latin-1)
 - ISO 8859-2 (Latin-2, Central/East European)
 - ISO 8859-3 (Latin-3, Esperanto, Galician, Maltese, and Turkish)
 - ISO 8859-4 (Latin-4, Scandinavian and Baltic)
 - ISO 8859-5 (Cyrillic)
 - ISO 8859-6 (Arabic)
 - ISO 8859-7 (Modern Greek)
 - ISO 8859-8 (Hebrew)
 - ISO 8859-9 (Latin-5, Turkish)
 - ISO 8859-10 (Latin-6, Lappish, Nordic, and Inuit)
 - ISO 8859-11 (Thai)
 - ISO 8859-13 (Latin-7, Baltic Rim)
 - ISO 8859-14 (Celtic)
 - ISO 8859-15 (Latin-9, Latin-1 with a Euro sign)
 - ISO 8859-16 (Romanian)
 
 - JIS
 - KOI8
 - Macintosh encodings
 - Morse code
 - MS-DOS encodings
 - PETSCII (or PET ASCII or CBM ASCII; used by Commodore computers)
 - Unicode
 - VISCII
 -  Windows encodings
- Windows 1252 (ISO 8859-1 plus additional characters)
 - Windows 1255 (Hebrew)
 - Windows 1256 (Arabic, Farsi, Urdu)
 - Windows 1257 (Baltic Rim)
 - Windows 1258 (Vietnamese)
 
 
Contents | 
Format details
- Byte Order Mark
 - C0 controls (ASCII control characters, 7 bit)
 - C1 controls (extended control characters, 8 bit)
 
Character escape codes
(used to enter characters in various systems and formats)
- Alt codes (DOS/Windows)
 - Backslash escapes (used in various programming and markup languages)
 - HTML character references (entities and numeric values)
 
Tools
Commentary and satire
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
 - The Language Double-Take: Dealing with Bidirectional Text (or: Wait, ?tahW)
 - Character encoding bugs are 𝒜wesome!
 - xkcd: Encoding
 - 8 New Punctuation Marks We Desperately Need
 
Other external links
- Lots of character encoding charts
 - The Evolution of Character Codes, 1874–1968
 - Collection of character encodings
 
References
- Ken Lunde, CJKV Information Processing, O'Reilly 2008, ISBN 978-0-596-51447-1 (has lots of information on encodings and Unicode in general, not only for CJKV locales)
 - IBM 3270 character set reference (1987)