Attention, students! Class is in session.
In my experience, the following two statements about standards are seemingly conflicting yet accurate:
- Standards are incredibly useful—and required—for product development.
- Standards cannot be completely trusted.
On one hand, developing products, such as typeface designs and their fonts, depends on standards.
On the other hand, standards themselves are developed by humans, meaning that they are prone to error, especially when they happen to be character set or glyph standards that include thousands or tens of thousands of representative glyphs.
Um, okay, so if standards cannot be completely trusted, and standards are required, what is a font developer to do? The answer is surprisingly simple: Do not depend on a single standard, and instead reference multiple standards. The remainder of this article will detail some interesting tidbits found in CJK character set and glyph standards.
This May 2014 article chronicles a particular representative glyph error in a character set standard, specifically GB 18030, and demonstrates that one may need to explore earlier standards or references that served as the source of the character and its representative glyph.
In terms of errors in glyph standards, another May 2014 article explored a somewhat minor representative glyph error in a Taiwan MOE glyph standard. That was a busy month for me, because another article detailed even more representative glyph issues in the same set of glyph standards.
This takes me to my latest discovery, which is the 砉 component in U+5268 剨, U+6E71 湱, and U+9A1E 騞, along with U+7809 砉 itself. Below are scans of the glyphs in three of Taiwan MOE’s glyph standards, along with scans of the corresponding glyphs in the all-important Kāngxī Dictionary (康熙字典 Kāngxī Zìdiǎn):
My conclusion? The representative glyphs for U+6E71 (Taiwan MOE 202189) are incorrect in that their 丰 and 石 components should not fully join. Another piece of evidence that supports my conclusion that this is an error is that the latest version of Hong Kong SCS uses a consistent and non-joining component for these four characters.
The articles from April 7th, April 8th, and April 12th of 2014 detailed some interesting representative glyph issues that affected the J column, whose adjustments are now reflected in the latest (Unicode Version 9.0) code charts.
There are also genuine duplicate characters to be aware of. Two prime examples are U+363D 㘽 (K3-2321) versus U+39B3 㦳 (K3-2623) and U+3588 㖈 (K3-224C) versus U+439B 䎛 (K3-2F51), all four of which come from the same KS standard (their source references, which use the “K3” prefix that corresponds to the unicorn-like PKS C 5700-2 standard, are provided in parentheses), and for which Korea has admitted that these two character pairs in Extension A are true duplicates. Implementations that are aware of this fact, such as the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK, can use a single glyph for each character pair, which conserves glyphs, but more importantly, guarantees that both characters render the same.
In closing, and in my experience developing CJK fonts, the overall best source of representative glyphs for CJK Unified Ideographs are the latest-and-greatest Unicode code charts: URO, Extension A, Extension B, Extension C, Extension D, and Extension E. It is far easier to correct a representative glyph error in the Unicode code charts than to correct them in the source national standards or references. But to repeat, the code charts cannot be completely trusted, especially when an inconsistency is found in a particular column. If you find what you believe to be an error, investigate it to the extent possible, and if it still appears to be an error, please report it. Keep in mind that in many cases, inconsistencies are intentional, such as what I detailed in the January 2014 and November 2014 articles that pertain to the G sources. The J sources include known and intentional inconsistencies, in terms of simplified versus traditional components.
The lesson to be learned from this article: When dealing with possible errors in standards, the further back one investigates, referencing multiple standards and other sources, the greater the chance one will find the correct answer.