Exploring IICore

In Part 1 and Part 2 of this series, we examined and scrutinized the ideographs that are tagged “K” (for ROK or South Korea), “P” (for DPRK or North Korea), and “J” (for Japan) in the kIICore property. In Part 3, which is today’s article, we will explore the 5,825 ideographs that are tagged “G” (for PRC or China).

The good news is that all of the ideographs that are included in the most common sets for China—the first 3,500 ideographs in 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo or TGH 2013) and the 3,755 ideographs of GB 2312 Level 1—are tagged “G” in IICore. When I merged these two sets, which resulted in 3,874 unique ideographs, 1,951 are not accounted for.

When I explored the next most important sets of ideographs for China, I found that 1,787 of the remaining 1,951 ideographs are in the second set of ideographs of 通用规范汉字表 (3,000), and 1,771 of them are among the 3,008 ideographs of GB 2312 Level 2. When merged, these two sets resulted in accounting for 1,847 ideographs of the remaining 1,951 ones, meaning that 104 are still not accounted for.

Finally, I found that 75 of the remaining 104 ideographs are in the third set of ideographs of 通用规范汉字表 (1,605), which leaves a mere 29 unaccounted for. The tables below lists these 29 remaining ideographs, separated by kIRG_GSource source prefix:

Ideograph	kIICore	kIRG_GSource—GB/T 12345
濛 U+6FDB	AGTHM	G1-7855
矇 U+77C7	AGTHM	G1-7857
硃 U+7843	AGTHM	G1-7927
穀 U+7A40	AGTJHKMP	G1-7836

Ideograph	kIICore	kIRG_GSource—GB 7589 unsimplified forms
䣅 U+48C5	CG	G3-6F29
䣓 U+48D3	CG	G3-7B67
劻 U+52BB	AGT	G3-333F
屌 U+5C4C	BGT	G3-3B53
枓 U+6793	AGTKP	G3-4066
肏 U+808F	CG	G3-305B
蹓 U+8E53	BGT	G3-7045
鯈 U+9BC8	AGT	G3-3233

Ideograph	kIICore	kIRG_GSource—GB 7590 unsimplified forms
䢵 U+48B5	CG	G5-6F4F
伕 U+4F15	AGTHM	G5-314F
晥 U+6665	AGKP	G5-496D
珮 U+73EE	AGTJHM	G5-4231
甽 U+753D	AGT	G5-5A23
礽 U+793D	BGT	G5-574C

Ideograph	kIICore	kIRG_GSource—GB 8565.2
晳 U+6673	AGJKP	G8-2D72 *
洩 U+6D29	AGTJKMP	G8-2F6B
濬 U+6FEC	AGTHKMP	G8-2D59 *
饤 U+9964	CG	G8-2D43
饾 U+997E	CG	G8-2D48

* = There is an issue with U+6673 晳 and U+6FEC 濬 in that the actual GB 8565.2 standard does not include characters at code points 0x2D72 (13-82) or 0x2D59 (13-57). These ideographs are actually present in ISO-IR-165 at those code points. See Jaemin Chung’s IRG N2276 for more details.

Ideograph	kIICore	kIRG_GSource—GB/T 16500
卻 U+537B	AGTHM	GE-237B
坵 U+5775	AGTKP	GE-2554
睪 U+776A	AGT	GE-3471
蹠 U+8E60	AGTJKP	GE-3F43
閒 U+9592	AGTJHKMP	GE-4361

Ideograph	kIICore	kIRG_GSource—康熙字典
䧑 U+49D1	CG	GKX-1352.16

Below is a modified version of the fifth table, which includes the five ideographs whose source references use the “GE” prefix, and which adds other source references from other properties. GB/T 16500 is interesting in a couple of ways. First and foremost, its 3,778 ideographs are simply meant to “fill in” URO (Unified Repertoire & Ordering) code points that otherwise lacked a kIRG_GSource property value, so they are effectively GBK characters. Second, as this tweet reports, the first two hexadecimal digits of all 3,778 source references are low by exactly 0x0F, and the source references in the table below reflect the corrections.

Ideograph	kIRG_GSource	Other Source References
卻 U+537B	GE-327B	HB1-AB6F, J0-524A, KP1-38C9, K1-5730, T1-5033, V1-4D7A
坵 U+5775	GE-3454	HB2-CBFA, J14-2468, KP0-D0EB, K0-4F26, T2-257A, V0-3438
睪 U+776A	GE-4371	HB1-B841, J14-7227, KP1-5E72, K2-4B4C, T1-6548
蹠 U+8E60	GE-4E43	HB2-F0F9, J0-6D28, KP0-EDA4, K0-7432, T2-6364
閒 U+9592	GE-5261	HB1-B6A2, KP0-F2D8, K0-7959, T1-6267, V2-907C

The fact that these five ideographs are tagged “G” in IICore is interesting, because on one hand their presence in the GB/T 16500 standard may suggest that they are not actually used in China, but on the other hand, they may actually be used in some specific contexts. At least, they are tagged with not only “G,” but with at least one or more additional tags.

Stay tuned for Part 4 of this series…

🐡

CJK Type Blog

CJK Fonts, Character Sets & Encodings.

Exploring IICore—Part 3

By Dr. Ken Lunde

Comments (0)

Created