Four of a Kind: KS X 1001 & KPS 9566

What do U+AE40 김 and U+6A02 樂 have in common? 🤔

I enjoy working with standards.

Interestingly, standards that were published by the Koreas—South Korea (aka ROK) and North Korea (aka DPRK)—include characters that appear more than once.

In the case of South Korea, it is well known that 268 of the 4,888 ideographs (aka hanja) in the KS X 1001 standard are duplicates, which affects ideographs for which there are more than one reading. This, of course, means that there are 4,620 unique ideographs in that standard.

In the case of North Korea, their original KPS 9566 standard that is dated 1997 separately encodes the modern hangul syllables that represent the names of the previous (at the time) and current (again, at the time) leaders.

KS X 1001

The 268 duplicate ideographs in South Korea’s KS X 1001 standard were included in Unicode as CJK Compatibility Ideographs in the range U+F900 through U+FA0B, and as of Unicode Version 6.3 there are now SVSes (Standardized Variation Sequences) that correspond to them. Almost all of the duplicate ideographs represent a second reading, meaning that there is only one instance of a duplicate ideograph. However, there are four ideographs for which there are two duplicate ideographs, and even one for which there are three, meaning that the ideograph in question actually appears four times in the standard as shown in the table below:

Ideograph Unicode KS X 1001 Position SVS Reading
U+6A02 68-37 N/A ak
U+F914 49-66 樂︀ <U+6A02, U+FE00> nak
U+F95C 53-05 樂︁ <U+6A02, U+FE01> rak
樂 U+F9BF 72-89 樂︂ <U+6A02, U+FE02> yo

KPS 9566

The duplicate modern hangul syllables in North Korea’s KPS 9566 standard are perpetually interesting for me. What’s potentially more interesting is that I am not aware of the availability of their standards outside of their country, except for ISO-IR-202 that represents a snapshot of the original version of the KPS 9566 standard.

With that said, the modern hangul syllables that represent the names of Kim Il-sung and Kim Jong-il, 김일성 and 김정일, respectively, are separately encoded from position 04-72 through 04-77. Based solely on the TrueType fonts that are bundled with North Korea’s Red Star OS, the modern hangul syllables that represent the name of Kim Jong-un, 김정은, were added from position 04-78 through 04-80 in an apparent update of the standard. This means that one modern hangul syllable appears four times in this particular standard:

Modern Hangul Syllable Unicode KPS 9566 Position
gim U+AE40 17-14
김 (from 일성) U+F113 (PUA) 04-72
김 (from 정일) U+F116 (PUA) 04-75
김 (from 정은) U+F120 (PUA) 04-78

Of course, the likelihood of Unicode separately encoding the modern hangul syllables that represent these three DPRK leader names is somewhere between zero and none, hence the use of PUA code points in their font implementations. The last page of L2/18-004 includes some suggestions that were rejected by the UTC. I also tweeted my own suggestion.


Comments are closed.