Standards 102—Silent Corrections

Continuing where my Standards 101 article left off, class is once again in session as Standards 102, and today’s topic is “silent corrections.”

The ultimate focus of this particular article is on the first three pages of WG2 N4008 (2011), Resolution M58.03 of WG2 N4104 (2011), and the Unicode mappings for two ideographs in GB 12052-89 (1989; 信息交换用朝鲜文字编码字符集), a standard from China that is a regional Korean character set. The two ideographs in question are at positions 72-33 and 72-67 in that standard. All of this started when I submitted L2/10-362 (2010), which proposed better source references for 94 ideographs that were appended to the special version of the GB/T 12345-90 (1990; 信息交换用汉字编码字符集―辅助集) standard that was used to compile the URO (Unified Repertoire & Ordering) in Unicode Version 1.1, but which are not actually present in that standard proper. It turns out that these ideographs originated in the GB 12052-89 standard.

But first, let’s briefly discuss the issue of “silent corrections” in standards, particularly in GB standards…

GB 2312-80

Perhaps the best example of a silent correction in a GB standard is the ideograph at position 79-81, whose glyph is shown as (U+937E) in the GB 2312-80 (1980; 信息交换用汉字编码字符集―基本集) standard, but which was silently corrected in the GB 6345.1-86 (1986; 信息交换用汉字 32×32 点阵字模集) standard as (U+953A). The image below shows (U+937E) at position 79-81 in the GB 2312-80 standard:

In contrast, the image below shows (U+953A) at the same position in the GB 6345.1-86 standard:

GB/T 12345-90

Next, two ideographs in the GB/T 12345-90 standard at positions 33-05 (0x4125) and 57-76 (0x596C) appear as (U+96B7) and (U+9CE7) in that standard, and were first corrected in the pseudo-extension of GB/T 12345-90 in which they appeared as U+96B8 and U+9CEC . The first image shows how the representative glyphs appear in the actual GB/T 12345-90 standard:

The next image below shows the two corrected representative glyphs as they appeared in the pseudo-extension of GB/T 12345-90:

The image below shows how these four ideographs appear in the latest code charts, and what is important are the highlighted “G1” source reference prefixes for U+96B8 and U+9CEC that correspond to the GB/T 12345-90 standard:

GB 18030-2000 & GB 18030-2005

In addition, the following text and table that span pp 109 and 110 of CJKV Information Processing, Second Edition provide additional history of silent corrections made in GB standards, along with details about four silent corrections made to GB 18030-2000 (2000; 信息技术 信息交换用汉字编码字符集 基本集的扩充) in the GB 18030-2005 (2005; 信息技术 中文编码字符集) standard:

GB 12052-89

Now back to GB 12052-89.

The image below shows rows 71 and 72 of the GB 12052-89 standard, with the ideographs at positions 72-33 and 72-67 highlighted:

The next image shows the same set of 94 ideographs in the pseudo-extension of the GB/T 12345-90 standard with the same characters highlighted:

In my experience dealing with East Asian standards for over 25 years, there are three very good indicators that strongly support the idea that GB 12052 72-33 and 72-67 are the correct—and accurate—source references for U+58ED and U+5655 , respectively. These indicators are 1) the 94 ideographs in GB 12052 and in the pseudo-extension of GB/T 12345-90 are not only identical, but they are also in the same order; 2) the pseudo-extension of GB/T 12345-90 was produced after GB 12052-89 was published; and 3) as demonstrated earlier in this lesson, silent corrections have been made to GB standards, which often manifest in a completely different standard.

As a result, the UAX #38 (Unicode Han Database) kIRG_GSource property for U+5655 and U+58ED should become GK-6863 and GK-6841, respectively. For reference, the image below shows how U+5655 and U+58ED appear in the Unicode Version 10.0 code charts:

In other words, the actual ideographs that appear in the GB 12052-89 standard at positions 72-33 and 72-67 are incorrect, and their shapes can be completely and safely ignored. The shapes that appear in the corresponding positions in the pseudo-extension of the GB/T 12345-90 standard, 93-39 and 93-73, respectively, are therefore to be considered the corrected versions, and clearly correspond to U+58ED and U+5655 , respectively.

What we learned today is that the pseudo-extension of the GB/T 12345-90 standard needs to be treated as a silent correction of the corresponding ideographs that appear in the older GB 12052-89 standard, if they differ.


Comments are closed.