Source Han Sans Version 2.000 Technical Tidbits

日本語 (Japanese) はこちら

(Everything that is stated in this article applies to the corresponding Google-branded Pan-CJK typeface family, Noto Sans CJK. Likewise, any reference to Source Han Serif also applies to Noto Serif CJK.)

The last time that a new version of the Source Han Sans family, along with the Google-branded version, Noto Sans CJK, was released was in June of 2015 in the form of Version 1.004. I know from personal experience that a lot of planning, preparation, and work took place during the three years that followed, and the end result is Version 2.000 of both Pan-CJK typeface families.

If you’re interested in learning more details about some of the changes, enhancements, and additions that Version 2.000 offers, please continue reading this article.

Extension G

The eye-catching animated image at the beginning of this article illustrates how two soon-to-be-encoded Extension G ideographs are supported in the five languages: Japanese, Korean, Simplified Chinese, Traditional Chinese for Taiwan, and Traditional Chinese for Hong Kong. Each weight is shown in the seven colors of the rainbow. Each ideograph requires four glyphs, and how they are shared across the five languages is different. Glyphs for two additional soon-to-be-encoded Extension G ideographs are also included in the fonts, meaning four in total.

For now, because Extension G code points are not yet stable, these glyphs need to be entered via their IDSes (Ideographic Description Sequences) that are subsequently processed by the 'ccmp' (Glyph Composition/Decomposition) GSUB feature. See the “CJK Unified Ideographs Extension G” Section and its table on page 13 of the official ReadMe file for more details.

The glyphs for the four Extension G ideographs were designed by members of our design team in Tokyo. Yui Yoshitomi (吉富ゆい), a former member of our design team, designed the initial glyphs for two of the ideographs, the ones for biáng (the traditional form is the ideograph on the left in the animated image above). Taisei Yoshida (吉田大成) subsequently tweaked those glyphs, which involved expanding the traditional form to encompass four separate region-specific glyphs, and he also designed the glyphs for the two additional Extension G ideographs, such as taito (the 84-stroke ideograph on the right in the animated image above). Of course, these glyphs were checked by Ryoko Nishizuka (西塚涼子), who oversees the design of these Pan-CJK typefaces.

Hong Kong Support

The most time-consuming aspect of the Version 2.000 update was adding support for a second flavor of Traditional Chinese, for Hong Kong and supporting the HKSCS-2016 standard, which increased the total number of individual font resources by 16, from 72 to 88. The number of fonts included in the Super OTC, which continues to be my preferred deployment format, increased by nine fonts, from 36 to 45.

Our friends at Arphic Technology (文鼎科技) did the bulk of the design work, which involved designing approximately 1,400 new HK glyphs and tweaking approximately 750 existing HK glyphs. They also designed approximately 50 new CN glyphs and approximately 150 new TW ones, along with tweaking approximately 125 CN glyphs and 400 TW ones.

While modern browsers have supported language-tagging for Hong Kong for quite some time, which could be easily tested via the open source LOCL Test font, the first Adobe app to do so is Adobe InDesign CC 2019, which was released just last month during Adobe MAX.

One of the ramifications of adding support for a fifth language means that there will be a non-zero number of characters, specifically ideographs, that will will require five separate glyphs, one for each of the five supported languages. My analysis of the mappings determined that there are 66 such characters. Another ramification is that the number of characters that require four separate glyphs increased, from 318 in Version 1.004, to 739 in Version 2.000. This is mainly due to the large influx of new HK glyphs.

Improved Bopomofo Support

In addition to adding glyphs for a very small number of newly- or soon-to-be-encoded bopomofo characters, we used the opportunity to (massively) improve their glyphs, and to provide better support for their use, to include tone marks. Ryoko Nishizuka, our extraordinarily talented typeface designer, took on the challenge of redesigning the glyphs for virtually all bopomofo, which is a new script for her. Of course, she received valuable feedback from bopomofo experts during the design process. The animated image below shows the Version 2.000 glyphs in black, along with the Version 1.004 glyphs superimposed in blue.

In terms of improving the support of bopomofo and tones, it primarily involved adding the 'GDEF' (Glyph Definition) table, the 'mark' (Mark Positioning) GPOS feature, and the 'ruby' (Ruby Notation Forms) GSUB feature. The 'vert' (Vertical Alternates) feature, in GSUB and GPOS form, is also involved.

Parity with Source Han Serif

Source Han Serif, which was released during the first half of 2017, introduced several enhancements and improvements, and the Source Han Sans Version 2.000 update provided an opportunity to do the same:

  • The CIDFont and CMap resources do not include XUID arrays. Per this article, they are no longer necessary.
  • There are no mappings for the range U+0000 through U+001F. These were also deemed unnecessary.
  • The code points that correspond to Halfwidth Jamo variants map to glyphs that correspond to code points in the Hangul Compatibility Jamo block, meaning that the glyphs for half-width jamo have been removed.
  • The 'name' (Naming) table does not include any Macintosh (PlatformID=1) strings. This resulted in cleaner 'name' tables.
  • The Regular weight is now style-linked to the Bold weight. This means that the Bold weight may not appear in the font menu, particularly when using apps that support style-linking as a way to make text bold.
  • The 'vert' GPOS feature is included, which means that combining jamo works in vertical writing mode, at least in apps that support 'vert' as a GPOS feature. This feature also benefits a small number of additional glyphs.
  • The deprecated 'hngl' (Hangul) GSUB feature has been removed from the fonts whose default language is Korean.

Additional Mappings

155 new mappings have been added to the CMap resources, which translates into support for additional characters and their glyphs.

The Version 1.004 fonts included 44,651 mappings, and the Version 2.000 ones therefore include 44,806. 66 of the new mappings are from BMP (Basic Multilingual Plane) code points, 22 are from Plane 1 code points, and the remaining 67 are from Plane 2 code points. Among the 67 new Plane 2 code points, 57 are from Extension B, two are from Extension C, three are from Extension E, and the remaining five are from Extension F.

The table below is an excerpt from the “Glyph Sharing Statistics” section on page 15 of the official ReadMe file that shows how the 44,806 mappings of Version 2.000 are distributed, and the extent to which glyphs are shared across the five supported languages:

The animated image below shows the 66 ideographs that have five separate region-specific glyphs:

As soon as Extension G is considered to be stable, four mappings from Plane 3 code points will be added.

Other Enhancements

A small number of additional enhancements were made, which will also find their way into the Source Han Serif Version 2.000 update that is still many months away:

  • The language and script declarations in the 'locl' (Localized Forms) and 'vert' GSUB features were improved.
  • URO (Unified Repertoire & Ordering) coverage is complete up through U+9FEF (Unicode Version 11.0).
  • The Traditional Chinese form of the Radical #162 (U+2ECE/U+8FB6) component, which is used for several hundred TW and HK glyphs, was improved. The image below illustrates this component in Version 1.004 (left) and Version 2.000 (right):
  • The glyphs for some of the kana were tweaked.
  • The fonts include blank placeholder glyphs for U+32FF, uni32FF and uni32FF-V, with the intention of making a subsequent dot-release on, before, or shortly after 2019-05-01 an easier process. This is the character that has been reserved for the two-ideograph square ligature form of Japan’s forthcoming new era name that takes effect on 2019-05-01.
  • The official ReadMe file was improved and expanded in various ways.

In closing, we put a lot of effort into the Version 2.000 update, so please enjoy! I am particularly excited that we now offer a second flavor of Traditional Chinese for Hong Kong, which involved quite a bit of time and effort to design and implement.

🐡

P.S. One of my subsequent tasks will be to update the Adobe Clean Han family, which serves as the web font for this blog, to Version 2.000.

11 Responses to Source Han Sans Version 2.000 Technical Tidbits

  1. For those who are keeping track, the Plane 3 (Extension G) code points for the four characters that include glyphs in Version 2.000 have been stable for the last two drafts, so the likelihood of them changing is relatively low: UTC-01200 (⿰ 氵恩) is at U+30729, UTC-01312 (⿺ 辶⿳穴⿰月⿰⿲⿱幺长⿱言马⿱幺长刂心) is at U+30EDC, UTC-00791 (⿺ 辶⿳穴⿰月⿰⿲⿱幺長⿱言馬⿱幺長刂心) is at U+30EDD, and UK-02960 (⿳ 雲⿲雲龍雲⿰龍龍) is at U+3106B.

  2. Khlieb says:

    This font also included Copyleft symbol, available at U+1F12F.

  3. Leon Vincent Sass says:

    Hi all!
    Quick question: I am wondering why most of the characters in Extension C–E are not displayed on this font-test-page: [ctext.org/font-test-page] when the Source Han fonts are activated on my Mac (but it works with the HanNom and HanaMin fonts). Do you think it is the website, the browser or my machine?

    • Keep in mind that Source Han Sans‘ coverage of Extensions C through E is rather minimal, covering only 47, 34, and 111 code points, respectively. The HanNom and HanaMin fonts have much broader coverage of those extensions.

  4. Charlie says:

    You wrote:
    “(Everything that is stated in this article applies to the corresponding Google-branded Pan-CJK typeface family, Noto Sans CJK. Likewise, any reference to Source Han Serif also applies to Noto Serif CJK.)”

    Well, not quite. As this is a matter that caused me several hours of confusion, I think it’s important for users to get this straight:
    The corresponding Noto Sans fonts have dropped the “CJK” part of their names and are now called “Noto Sans HK”, “Noto Sans JP”, “Noto Sans KR”, “Noto Sans SC”, and “Noto Sans TC”. The same applies to the Noto Serif series. Noto fonts with “CJK” in their names are still available for download but contain the old glyphs and character sets of version 1.00x.

    • I am not sure where you’re getting your information, but Google updated the Noto CJK repository late last week, and the Noto Sans CJK Version 2.000 fonts are now there. They have not yet updated the main Noto page nor its Noto CJK page, but that should happen soon. What I wrote in the ReadMe file is accurate, but there is lag between when Source Han Sans Version 2.000 was released and when Google deployed their branded versions.

      • Charlie says:

        I downloaded the master branch in one large zip file from https://github.com/googlei18n/noto-cjk a few days ago and found that the “Noto Sans CJK…” OTF fonts were Version 1.004 while the other Noto Sans OTF fonts were Version 2.000. Thanks to your hint I redownloaded the fonts today, and now all Noto Sans fonts are updated. Sorry for the inconvenience.

  5. Charlie says:

    I just read Richard Ishida’s “Bopomofo on the Web” at https://r12a.github.io/scripts/bopomofo/ontheweb which made me wonder if U+02D9 DOT ABOVE representing the light tone in Bopomofo shouldn’t display in front of a syllable in horizontal layout even though it follows that syllable in the character string. (Note that Bopomofo IMEs require you to type any tone mark last. — Vertical layout has other quite complex rules for Bopomofo tone mark positioning.) Is this something that needs to be changed in the Source Han fonts?

  6. Michel MARIANI says:

    First of all, thanks for this set of beautiful, carefully crafted CJK typefaces. It’s definitely a tremendous amount of work and dedication.

    Thanks to their availability in five languages, my open-source app Unicode Plus is embedding these typefaces to provide several features: the “CJK Font Variants” utility can display simultaneously any string of CJK characters in the five languages, and the “Unihan Inspector” utility allows cycling through the five variants of a Unihan character, for a better understanding of the font-dependent radical/strokes information.

    Right now, the only freely available typeface covering most of Unicode 11.0 I know of is the Hanazono Minchō font, but it is Serif only and rather Japanese-oriented, obviously.

    Are they any plans for a full Unihan coverage extension of Source Han Sans in a near future? Also, are there any other flavors (Macao, Singapore, …) which might later be introduced, provided they are indeed relevant?

    And can it be safely assumed that all Unihan characters belonging to the IICore set are covered by the current set of typefaces?

    • To answer your first question, Singapore is covered by Simplified Chinese. Nothing suggests that separate fonts are necessary. I do have plans to add Macao SAR support as a third flavor of Traditional Chinese. I am in the process of registering a new OpenType language tag for that: ZHTM.

      To answer your last question, the 9,810 ideographs that have the kIICore property are supported by Source Han Sans.