TRON code
(Make the link of Glyph Wiki in the order of the character code range number) |
|||
Line 80: | Line 80: | ||
Notes: | Notes: | ||
+ | |||
+ | * Usually "TRON code" means the TRON-16 encoding, if none other is specified. | ||
* The range of valid codepoints in TRON-32 do not overlap those of UTF-32 at all, so it is unambiguous and can even be mixed unambiguously if this is somehow desirable. | * The range of valid codepoints in TRON-32 do not overlap those of UTF-32 at all, so it is unambiguous and can even be mixed unambiguously if this is somehow desirable. |
Revision as of 18:22, 18 April 2023
This article describes the character encoding used in TRON. Unlike Unicode, it does not use the Han unification; it can clearly distinguish Japanese from Chinese texts.
Character codes are two byte codes and are split into four zones:
- A zone: High byte and low byte are both in range 0x21 to 0x7E.
- B zone: High byte in range 0x80 to 0xFD and low byte in range 0x21 to 0x7E.
- C zone: High byte in range 0x21 to 0x7E and low byte in range 0x80 to 0xFD.
- D zone: High byte and low byte are both in range 0x80 to 0xFD.
The character codes are grouped in planes; the language selection is by first byte 0xFE and then second byte makes the plane number added to 0x20 (for example, plane 1 is selection by code 0xFE21). The default plane (if not otherwise specified) is usually plane 1.
List of planes:
- 1 = JIS, GB2312, KS X 1001, and Braille
- 2,3 = GT
- 6 = Big5
- 8,9 = Dai-Kan-Wa-Jiten, hentaigana, etc
- 10 = Dongba symbols
Conversion other formats into TRON is described below.
Contents |
Plane 1
JIS X 0208, and first plane of JIS X 0213:
hi = ku+0x20 lo = ten+0x20
hi = ku+0xA0 lo = ten+0x20
Second plane of JIS X 0213:
hi = numbers 0x87 to 0xA0, contiguous by valid rows of JIS X 0213 (1,3-5,8,12-15,78-94) lo = ten+0x20
hi = ((ku-1)*94+ten-1)/126+0x21 lo = ((ku-1)*94+ten-1)%126+0x80
hi = ((ku-1)*94+ten-1)/126+0xB7 lo = ((ku-1)*94+ten-1)%126+0x80
(unknown)
- Codes starting at 0x8021 are 6-dot braille, and codes starting at 0x8121 are 8-dot braille. (I do not know the specific encoding, though. Anyone who does know, please correct this)
Plane 8
- All codes are Dai-Kan-Wa-Jiten characters 1 to 48055 (presumably according to "linear2hilo" function?).
Plane 9
- Codes 0x8021 to 0x8230 are uncommon variants of hiragana/katakana, such as small letters and voice mark for letters which are not commonly used in this way.
- Codes 0x8321 to 0x846A are hentaigana.
- Codes 0x9621 to 0x967E are apparently something to do with Chinese elements (I don't know for sure if this is correct, or what the specific encoding is?)
- Codes 0x9721 to 0x972A are the Chinese/Japanese numbers one to ten in the square.
- Codes 0x972B to 0x975A are the katakana in parentheses. (They seem to be in the usual modern "grid order" of Japanese alphabets, excluding small letters and dakuten/handakuten, but including the "wi" and "we")
- Codes 0x975B to 0x9766 are the lowercase roman numbers i to xii in the circle.
- Codes 0x9767 to 0x977A are the numbers 1 to 20 in the triangle.
- Codes 0x9830 to 0x9839 are Baronh numbers 0 to 9.
- Codes 0x9840 to 0x985B are Baronh alphabets: a e i ï u ü é o c s t l n h p f m ai y ÿ œ r au eu g z d b
Plane 16 and 17
These plane are used for te Basic Multilingual Plane of Unicode 2.0 (even the ASCII control codes are mapped for some reason), but not CJK characters that have Han unification in Unicode that are mapped elsewhere in TRON without Han unification.
The encoding is linearly starting from the A zone, and then B zone, C zone, D zone.
Plane 22 and 23
These plane are used for GB 18030.
Encodings
The most common encodings are probably TRON-16BE (also called "TADTextBE") and TRON-16LE. They use 16-bit code units. A code with 0xFE as the high byte is plane selection, with the plane number in the low byte (0x21 to 0xFD except 0x7F), or if it is 0xFEFE then it selects the next volume, and then the next is 0x0021 to 0x00FD (except 0x007F), or 0xFEFE and then 0xFE21 to 0xFEFD for the third volume, or the second is also 0xFEFE if the fourth volume, etc. Control characters are represented as 0x0000 to 0x0020, and 0x007F. Codes 0xFF21 to 0xFF7E are special codes used in some applications, while TAD files will also use 0xFF80 to 0xFFFD for segments.
A less common (also apparently unofficial) encoding is TRON-8, which is TRON-16BE encoded without leading zeros. (Due to this, it can then be used for null-terminated strings in C.)
TRON-32BE (also called "stateless-TADTextBE") and TRON-32LE work as follows: The low 16-bits are the code within the plane, the high 8-bits select the volume (where zero means the first volume), and the next 8-bits select the plane within that volume (0x21 to 0xFD, except 0x7F).
There is also the "&T" code, which is similar to the character entities in HTML and XML. It starts by &T
and then the hex code same as in TRON-32 (but usually without leading zeros) and then a semicolon on the end.
Notes:
- Usually "TRON code" means the TRON-16 encoding, if none other is specified.
- The range of valid codepoints in TRON-32 do not overlap those of UTF-32 at all, so it is unambiguous and can even be mixed unambiguously if this is somehow desirable.
- If you have a TRON-16BE text without null characters, you can convert to TRON-8 by stripping out all null bytes. (If it is TRON-16LE, you can byte swap first and then do that.)
- Text that explicitly specifies the plane can be distinguished from (but not necessarily mixed with) Unicode with byte order marks.
- Since the bytes in the ASCII control characters range 0x00 to 0x1F and 0x7F are not used for text in TRON-8 that doesn't have control characters, it can be used in contexts that use those control codes for other purposes, without interference.
External resources
- Partially TRON plane 9 in GlyphWiki:
- PDF with GT font list