GB 12052-89: PRC Standard For Korean

While it was not uncommon for early (pre-Unicode) CJK character set standard to include characters that correspond to scripts of other languages or used in other countries, such as the extent to which Japanese kana were included in standards from China and Korea, it was not common for one of these countries to produce a standard for a seemingly different language. Enter GB 12052-89 (entitled Korean Character Coded Character Set for Information Interchange, or 信息交换用朝鲜文字编码字符集 in Chinese), which is a GB (PRC) standard that sort of broke this mold.

The first question that comes to mind is Why would China go to the bother of producing a character set standard for Korean? The answer is actually quite simple: there is a sizable Korean-speaking population in China.

Although I briefly described GB 12052-89 in Chapter 3 (pp 150 and 151) of CJKV Information Processing, which included a table (Table 3-79) that indicated which characters were encoded in what rows, and also alluded to some errors or inconsistencies in the standard, my good friend Jaemin Chung recently took it upon himself to resolve these and other remaining issues, and compiled a complete Unicode mapping table for this standard, which is the first of which I am aware. Considering all of the issues, this was no small feat. Jaemin also compiled the known errors into a comprehensive errata file.

One of the more interesting issues that Jaemin discovered was that the representative glyphs for the characters in the per-row code tables in the standard proper did not always match what was shown in the fold-out 94×94 table (for those who are wondering, the actual size of this fold-out table is 57cm × 62.5cm). In at least four cases, the representative glyph for a character in the standard proper was missing, but was present and accounted for in the fold-out 94×94 table.

Back in 2010, I made my own attempt at resolving the issues surrounding the 94 ideographs that are included in this standard by submitting L2/10-362, which proposed to change the kIRG_GSource source references from pseudo GB/T 12345-90 ones to genuine GB 12052-89 ones. This resulted in 89 new kIRG_GSource "GK" source references that directly correspond to GB 12052-89. Of the five remaining ideographs, three already had kIRG_GSource "G3" or "G5" source references that I indicated in L2/10-362, and two (72-33 and 72-67) were deemed not unifiable (though I strongly suspect that the representative glyphs in GB 12052-89 were simply incorrect, and were corrected via the pseudo GB/T 12345-90 extension).

Although GB 12052-89 is not actively being used, and probably never was in any meaningful or practical way, it does represent an historical document, and also provides a glimpse at how character set standards for minority or regional populations were compiled.

Many thanks to Jaemin for his efforts.

Comments are closed.