Exploring IICore—Part 1

Today’s article is the very first one that references IICore (International Ideographs Core), which is best described as a region-agnostic subset that includes the most commonly used CJK Unified Ideographs in Unicode, and is intended for use in memory-challenged devices and environments. Included are 9,810 ideographs, the bulk of which are in the URO (9,706), with the remaining ones in Extensions A (42) and B (62).

IICore is instantiated as the kIICore property of the Unihan Database, and documented in UAX #38. The kIICore property values consist of an initial letter—A, B, or C—that indicates priority, followed by one or more letters that specify a source that more or less corresponds to a region: G, H, J, K, M, P (short for KP), and T.

In Part 1 of what may eventually become a multiple-part series about IICore, I will briefly explore the ideographs that are tagged “K” for Korean use, along with pointing out some that should have been tagged “K” after examining the mappings to the KS X 1001 standard.

A total of 4,744 ideographs are tagged “K” in their kIICore property values. Of these, 138 are outside of KS X 1001. We’ll come back to them at the end of this article.

It is very curious that only 14 of the 4,620 ideographs that are included in the KS X 1001 standard are not tagged “K” in their kIICore property values, yet are included in kIICore. The table below lists them and their kIICore property values, along with a related ideograph, if any:

Ideograph kIICore Related Ideograph kIICore
塞 U+585E AGTJHMP 塞 U+F96C n/a
奬 U+596C AP 獎 U+734E ATHKM
復 U+5FA9 ATJHMP 復 U+F966 n/a
慄 U+6144 ATJHMP 慄 U+F9D9 n/a
戀 U+6200 ATHMP 戀 U+F990 n/a
撚 U+649A ATJHMP 撚 U+F991 n/a
栗 U+6817 AGTJHMP 栗 U+F9DA n/a
渗 U+6E17 AG 滲 U+6EF2 ATJHKMP
耉 U+8009 AP 耈 U+8008 CK
胄 U+80C4 AGTJP 冑 U+5191 ATJHK
詰 U+8A70 ATJHMP NONE n/a
諾 U+8AFE ATJHMP 諾 U+F95D n/a
輦 U+8F26 ATJHP 輦 U+F998 n/a
默 U+9ED8 AGTHMP 黙 U+9ED9 AJK

Eight of the ideographs can be explained by guessing that an initial version of IICore may have included the corresponding CJK Compatibility Ideographs that were subsequently stripped out. Another five—U+734E 獎, U+6EF2 滲, U+8008 耈, U+5191 冑 & U+9ED9 黙—can be explained because they were apparently the preferred code points for the very popular HWP (Hangul Word Processor) app (according to Jaemin Chung), which was likely used to enter the ideographs by those who compiled the list for Korea (ROK). The only possible explanation for U+8A70 詰 seems to be because it happens to be the very last hanja (aka ideograph) in the KS X 1001 standard, and may have felt victim to an inadvertent off-by-one error.

The obvious fix here is to simply tag the 14 characters on the left column of the table with “K” in their kIICore property values, which will make KS X 1001 support complete, and the best part is that it will not change the number of ideographs in IICore.

Going back to the 138 ideographs outside of KS X 1001 that are tagged “K” in their kIICore property values, it turns out that the following seven do not have a kIRG_KSource property value, which raises the proverbial red flag:

Ideograph kIICore—Claimed K-Source (in IRG N1025) Source References
媴 U+5AB4 CK—K3 G5-4047, HB2-DD43, T2-4249
琟 U+741F CK—K3 G3-3F59, H-98CA, KP1-5945, T3-3D35
璤 U+74A4 CK—K3 GE-3354, H-FC71, T3-6567
璸 U+74B8 BTK—K3 G3-3F71, HB2-F040, KP1-59CB, T2-622D
砇 U+7807 CK—K3 G5-577A, KP1-5FAC, T3-2E3B
穦 U+7A66 CK—K3 GE-3642, KP1-62B1, T3-5A65
黙 U+9ED9 AJK—K0 GE-4874, J0-4C5B, T4-5560

Unfortunately, the people who compiled the “K” portion of IICore either passed away or are no longer participating in the Korean National Body, compounded by the fact that there is no document nor report explaining how the “K” portion of IICore was prepared, so we may never know exactly why these seven ideographs were tagged “K” in their kIICore property values, as well as the other 131 that are outside the scope of KS X 1001. Only U+9ED9 黙, which makes an appearance in both tables, can be explained by being the preferred code point for the HWP app.

About the 131 K-tagged IICore ideographs that are outside the scope of KS X 1001, 79 have K1 (aka KS X 1002) source prefixes, 48 have K2 (aka KS X 1027-1), and only four have K3 (aka KS X 1027-2).

🐡

Comments are closed.