Exploring IICore—Part 4

In Part 1, Part 2, and Part 3 of this series, we examined and scrutinized the ideographs that are tagged “K” (for ROK or South Korea), “P” (for DPRK or North Korea), “J” (for Japan), and “G” (for PRC or China) in the kIICore property. In Part 4, which is today’s article, we will explore the ideographs that are tagged “T” (for ROC or Taiwan), “H” (for Hong Kong SAR), and “M” (for Macao SAR).

ROC—Taiwan

A total of 6,566 ideographs are tagged “T” in IICore. When I compared these against the two most basic ideograph sets from Taiwan—the 5,401 ideographs in CNS 11643 Plane 1 and the 4,808 ideographs in 常用國字標準字體表 (chángyòng guózì biâozhǔn zìtǐ biǎo)—I discovered that only one, U+5F5E , is neither tagged “T” nor present in IICore, though its related ideograph that is included in Big Five Level 1, U+5F5D , is tagged “T” in IICore. (This ideograph pair represents the only difference between CNS 11643 Plane 1 and Big Five Level 1, both of which include 5,401 ideographs.)

Other than the one omission pointed out in the previous paragraph, 1,156 ideographs remain outside the scope of what is a reasonably minimal set. Predictably, most of them—1,063 to be exact—map to CNS 11643 Plane 2, which is equivalent to Big Five Level 2, and another 81, two of which—U+3577 and U+4CB3 —are in Extension A, map to CNS 11643 Plane 3.

That leaves a mere 12 T-tagged IICore ideographs outside the scope of the first three planes of CNS 11643. Six of them map to CNS 11643 Plane 4 (with half being in Extension A), one maps to Plane 5, and two map to Plane 15. The three tables below provide their details:

Ideograph kIICore kIRG_TSource—CNS 11643 Plane 4
U+4C81 CT T4-697C
U+4C85 CT T4-697B
U+4D08 CT T4-6C52
U+7374 CT T4-566C
U+8025 CT T4-462C
U+9BDD CT T4-625C
Ideograph kIICore kIRG_TSource—CNS 11643 Plane 5
U+9C72 BTH T5-7A53
Ideograph kIICore kIRG_TSource—CNS 11643 Plane 15
U+7551 ATJKP TF-2B7A
U+9C47 ATJKP TF-6A3E

The three remaining ideographs are the only somewhat suspicious ones in that they do not have a kIRG_TSource property value, but are related to ideographs that are tagged “T” in IICore and are in CNS 11643 Plane 1 or 2, per the table below:

Ideograph kIICore Other Source References Related Ideograph
U+55EC BGTH G0-6040, H-8F52 U+5475
U+7934 BGT G0-6D67, H-FEE8, J13-7932, KP1-6109, K2-4D65 U+7921
U+7E4A ATJ GE-3858, J0-4121, KP1-67CC, K2-5330 U+7E96

The only actions that I can suggest are to tag U+5F5E 彞 “T” in IICore, and for Taiwan to consider a horizontal extension for U+55EC , U+7934 , and U+7E4A .

Hong Kong SAR

A total of 5,224 ideographs are tagged “H” in IICore. When I compared these against the 5,401 ideographs in Big Five Level 1, I discovered that 577 are not included. This leaves 400 ideographs, 171 of which map to Big Five Level 2, and the remaining 229 map to Hong Kong SCS proper (24 are in Extension A, 61 are in Extension B, and the remaining 144 are in the URO).

All looks okay until we consider Hong Kong SCS-2016 that added 24 new characters, 22 of which are best described as the preferred Hong Kong SAR forms of existing Big Five ideographs. Of these 22 ideographs, 14 have corresponding Big Five versions that are tagged “H” in IICore, which strongly suggests that they should be tagged “H” if already present in IICore, or added to IICore and tagged “H.” The following table provides the details:

HKSCS-2016 kIICore Big Five Level 1 kIICore
U+5151 AG U+514C ATJHKMP
U+543F n/a U+544A AGTJHKMP
U+5AAA CG U+5ABC ATJHKM
U+60A6 AGJ U+6085 ATHKMP
U+6120 CG U+614D ATHM
U+6C32 n/a U+6C33 ATH
U+7A0E AGJ U+7A05 ATHKMP
U+8131 AGJ U+812B ATHKMP
U+85F4 n/a U+860A ATJHKMP
U+8715 AG U+86FB ATHM
U+8AAC AJ U+8AAA ATHKMP
U+9196 n/a U+919E ATHM
U+92ED AJ U+92B3 ATHKMP
U+95B2 AJ U+95B1 ATHKMP

Macao SAR

A total of 4,955 ideographs are tagged “M” in IICore. When I compared these against the 5,401 ideographs in Big Five Level 1, I discovered that 739 are not included. This leaves 283 ideographs, 223 of which map to Big Five Level 2, and 59 of which map to HKSCS (two are in Extension A, eight are in Extension B, and the remaining 49 are in the URO). Only one ideograph, U+5F66 , stands out as odd in that its source references do not suggest Macao SAR use. Its related ideograph, U+5F65 , is also tagged “M” in IICore (ATHM), and its source references, particularly T1-507D, more strongly suggest Macao SAR use. The table below provides more details about these two ideographs:

kIICore—AGJKMP Source References kIICore—ATHM Source References
U+5F66 G0-5165, J0-4927, KP0-F8BA, K0-6569, T3-2C50 U+5F65 GE-2955, HB1-ABDB, KP1-41F9, T1-507D

In addition, 13 of the 14 ideographs—meaning all except for U+6C32 —in the first column of the table in the “Hong Kong SAR” section above should probably be tagged “M” in IICore, because Macao SAR has similar regional conventions, and because the ideographs in the third column are already tagged “M” in IICore.

Interestingly, I never mentioned anything about the kIRG_MSource property in the previous paragraph, because none of the M-tagged ideographs in IICore have such source references. Given that there is a fairly close relationship with Big Five and HKSCS, comparing against those sets seemed to be appropriate, and as it turned out, was completely appropriate.

🐡

Comments are closed.