To continue yesterday’s article about different prototypical glyphs for Unicode code points that are common between JIS X 0212-1990 and JIS X 0213:2004, today’s article will focus on the normative references that correspond to JIS X 0213:2004, or rather the lack thereof.
According to UAX #38 (Unicode Han Database), the following three kIRG_JSource fields correspond to JIS X 0213:
J3 | J3A | J4 |
---|---|---|
JIS X 0213:2000 Level 3 | JIS X 0213:2004 Level 3 | JIS X 0213:2000 Level 4 |
However, the number of actual mappings does not correspond to the number of kanji in those sets. In fact, they don’t even come close. ☹
The table below shows the number of expected mappings, the number of actual mappings in each field, and the number of missing mappings:
Field | Expected | Actual | Missing |
---|---|---|---|
J3 | 1,249 | 194 | 1,055 |
J3A | 10 | 8 | 2 |
J4 | 2,436 | 665 | 1,771 |
In other words, 2,828 JIS X 0213 mappings are not normative. The explanation for this omission is simple: 2,743 existing J1 (JIS X 0212-1990) and 85 existing JA (Unified Japanese IT Vendors Contemporary Ideographs) source references were established prior to JIS X 0213, and currently take precedence.
However, given that Unicode is intended to reflect contemporary usage, one would think that in addition to showing the contemporary glyph for a particular region, the contemporary source reference should also be the normative one.
I assembled the relevant data this morning.
Look at this list, which enumerates the 2,828 CJK Unified Ideographs that currently use J1 (JIS X 0212) or JA (Unified Japanese IT Vendors Contemporary Ideographs, 1993) source references. This list enumerates the same set of 2,828 CJK Unified Ideographs, but instead use the J3 (1,055), J3A (2), and J4 (1,771) source references that would be necessary to make complete JIS X 0213 coverage normative.
One argument against this proposed change to normative data is that it does not reflect what was originally unified in the CJK Unified Ideograph blocks. Luckily, The Unicode Consortium makes it easy to access previous versions of Unicode in order to get to any information of historical significance. My argument to support this proposed change is that just as JIS standards evolve to reflect contemporary usage in Japan, so must Unicode and ISO/IEC 10646, meaning that the latter (international) standards should not be treated as historical repositories for the national standards on which they’re based.
I should also point out that the provisional kJIS0213 field specifies source references for all 3,695 kanji in JIS X 0213:2004. Likewise, the provisional kJis1 field specifies source references for all 5,801 kanji in JIS X 0212-1990. In other words, the JIS X 0212-1990 source references would not be lost via this proposal, but rather would be demoted to provisional status.
Thoughts?