To GB 18030, Or Not To GB 18030…

One of the questions one may ask about the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK open source Pan-CJK typeface families is whether they are GB 18030–compliant. Compliant? Sort of. Certified? Not yet.

Let me explain…

There are currently two versions of GB 18030. The first version is the original one that is from the year 2000. The second version is a revision that is from the year 2005. Compliance involves including all of the characters that are printed in the code charts of GB 18030-2000, but using the ISO/IEC 10646 (aka Unicode) code points that are printed in the 2005 revision. Unfortunately—and ignoring China’s apparent allergy to using Extension B code points that were standardized in 2001 as ISO/IEC 10646-2:2001 (aka Unicode Version 3.1), six of which are required by GB 18030—GB 18030-2005 was published shortly before the version of ISO/IEC 10646 that included non-PUA code points for 18 of the characters, which was ISO 10646:2003 plus Amendment 1 (2005; aka Unicode Version 4.1). What is printed in GB 18030-2005 proper for these 24 characters—18 are in the BMP, and the remaining six are in Plane 2 within Extension B—are PUA code points. GB 18030-2000 actually included a higher number of PUA code points, but most of them were corrected in the 2005 revision to become non-PUA code points.

Source Han Sans (and Noto Sans CJK) are GB 18030–compliant in the sense that all required characters are covered, but this is done by using non-PUA code points for these 24 characters. In the context of single-region font, such as for Simplified Chinese, it may make sense to support these PUA code points, but in the context of a Pan-CJK font that works as a system to serve multiple regions, supporting legacy PUA code points for a particular region smells like a Really Bad Idea™ to me, mainly because it creates a nasty legacy condition that the world is better without.

Unfortunately, Source Han Sans cannot be GB 18030–certified in its current form at this time, but as soon as the next version of GB 18030 is published, which is expected to correct these 24 PUA code points to become non-PUA ones, Adobe has plans to submit it for certification.

A complete table of these 24 characters is shown below (Extension B code points have been highlighted):

I prepared and added to the “release” branch of the Source Han Sans repository a 24-line file that provides the 24 PUA mappings, from GB 18030–based PUA code points to Source Han Sans CIDs, for those developers who need Source Han Sans or Noto Sans CJK in a form that can be GB 18030–certified now.

For those who plan to endeavor to actually do this, which I do not recommend unless it is an absolute requirement, I strongly suggest that it be done only to the language-specific fonts and region-specific subset fonts for Simplified Chinese, meaning SourceHanSansSC-* and SourceHanSansCN-*, respectively, for Source Han Sans (or NotoSansCJKsc-* and NotoSansSC-*, respectively, for Noto Sans CJK). The other language-specific fonts and region-specific subset fonts cannot be GB 18030–certified anyway because the default glyphs do not follow Mainland China conventions.

Now, back to working on the Version 1.002 update…

Comments are closed.