TRON code
From Just Solve the File Format Problem
(Difference between revisions)
(Created page with "{{FormatInfo |formattype=electronic |subcat=Character encoding }} This article describes the character encoding used in TRON. Unlike Unicode, it does not use the Han u...") |
|||
Line 20: | Line 20: | ||
* 10 = Dongba symbols | * 10 = Dongba symbols | ||
− | + | Conversion other formats into TRON is described below. | |
+ | |||
+ | ==Plane 1== | ||
[[JIS X 0208]], and first plane of [[JIS X 0213]]: | [[JIS X 0208]], and first plane of [[JIS X 0213]]: | ||
hi = ku+0x20 | hi = ku+0x20 |
Revision as of 07:41, 11 December 2021
This article describes the character encoding used in TRON. Unlike Unicode, it does not use the Han unification; it can clearly distinguish Japanese from Chinese texts.
Character codes are two byte codes and are split into four zones:
- A zone: High byte and low byte are both in range 0x21 to 0x7E.
- B zone: High byte in range 0x80 to 0xFD and low byte in range 0x21 to 0x7E.
- C zone: High byte in range 0x21 to 0x7E and low byte in range 0x80 to 0xFD.
- D zone: High byte and low byte are both in range 0x80 to 0xFD.
The character codes are grouped in planes; the language selection is by first byte 0xFE and then second byte makes the plane number added to 0x20 (for example, plane 1 is selection by code 0xFE21). The default plane (if not otherwise specified) is usually plane 1.
List of planes:
- 1 = JIS, GB2312, KS X 1001, and Braille
- 2,3 = GT
- 6 = Big5
- 8,9 = Dai-Kan-Wa-Jiten, hentaigana, etc
- 10 = Dongba symbols
Conversion other formats into TRON is described below.
Plane 1
JIS X 0208, and first plane of JIS X 0213:
hi = ku+0x20 lo = ten+0x20
hi = ku+0xA0 lo = ten+0x20
Second plane of JIS X 0213:
hi = numbers 0x87 to 0xA0, contiguous by valid rows of JIS X 0213 (1,3-5,8,12-15,78-94) lo = ten+0x20
hi = ((ku-1)*94+ten-1)/126+0x21 lo = ((ku-1)*94+ten-1)%126+0x80
hi = ((ku-1)*94+ten-1)/126+0xB7 lo = ((ku-1)*94+ten-1)%126+0x80
(unknown)
Plane 9
- Codes 0x9721 to 0x972A are the Chinese/Japanese numbers one to ten in the square.
- Codes 0x972B to 0x975A are the katakana in parentheses.
- Codes 0x975B to 0x9766 are the lowercase roman numbers i to xii in the circle.
- Codes 0x9767 to 0x977A are the numbers 1 to 20 in the triangle.