One of the reasons why Source Han Sans—and obviously the Google-branded Noto Sans CJK—can be considered the world’s first Pan-CJK typeface family is due to its support for Korean hangul. While it is common to support modern hangul in Korean fonts, supporting archaic hangul is relatively uncommon. One of the more challenging aspects of developing Source Han Sans was implementing support for archaic hangul, which also included handling 500 high-frequency archaic hangul syllables. This article will thus detail what went into supporting archaic hangul in Source Han Sans. I’d like to once again thank our talented friends at Sandoll Communications for designing the glyphs for these characters.
Modern hangul includes 11,172 syllable-like characters that have been encoded in Unicode since Version 2.0 (July 1996). While 2,350 of them are considered higher-frequency and correspond to those specified in Korea’s KS X 1001 standard, it is common for today’s Korean fonts, especially commercial ones, to support all 11,172.
Archaic hangul is a different matter altogether, mainly because the individual graphemes are encoded separately, and three particular OpenType GSUB features—'ljmo' (Leading Jamo Forms), 'vjmo' (Vowel Jamo Forms), and 'tjmo' (Trailing Jamo Forms)—are orchestrated to combine them into a rectangle that represents a grapheme cluster. Supporting 500 high-frequency archaic hangul syllables further complicates the matter.
In terms of supporting archaic hangul via combining jamo, Source Han Sans includes six sets of leading consonants (L), two sets of vowels (V), and four sets of trailing consonants (T) that are used to support LV and LVT sequences that correspond to archaic hangul syllables. When you figure in the number of L (125: U+1100 through U+115F and U+A960 through U+A97C), V (95: U+1160 through U+11A7 and U+D7B0 through U+D7C6), and T (137: U+11A8 through U+11FF and U+D7CB through U+D7FB) components, along with the fact that both LV and LVT sequences are valid, the number of possible combinations is thus a staggering 1,638,750 (125 L × 95 V = 11,875 LV plus 125 L × 95 V × 137 T = 1,626,875 LVT). Of course, the 11,172 modern hangul syllables represent a (very small) subset of this large figure. (For those who care, modern hangul syllables are calculated as 19 L × 21 V = 399 LV plus 19 L × 21 V × 27 T = 10,773 LVT.)
In terms of the OpenType implementation, a total of 1,488 glyphs are used, and correspond to 6 × 125 L plus 2 × 95 V plus 4 × 137 T. This figure, of course, ignores the 357 (125 L + 95 V + 137 T) nominal (encoded) forms of these characters. Each set is handled as a separate “lookup” that is referenced in the corresponding GSUB feature. The 750 (6 × 125) L glyphs have 920-unit widths, and the 190 (2 × 95) V and 548 (4 × 137) T glyphs have zero-unit widths and are shifted 920 units to the left, which allows them to overlay the L glyphs.
High-Frequency Archaic Hangul
Things get perhaps a bit more interesting when we start exploring what went into implementing the 500 high-frequency archaic hangul syllables. The purpose of these glyphs, as the name sort of suggests, is to provide hand-tuned (or pre-composed) versions of archaic hangul syllables that have been deemed higher-frequency, and thus look better. The 'ccmp' (Glyph Composition/Decomposition) GSUB feature is used for this purpose, and one discovery that I made just prior to the mid-July launch was that the two-character (LV) sequences that correspond to one of the high-frequency archaic hangul syllables were blocking three-character (LVT) combining jamo sequences: the initial LV sequence was rendered as a pre-composed high-frequency archaic syllable, and the final T was, well, trailing, not combining. The solution was to use the “ignore substitute” construct in the 'ccmp' GSUB feature to ignore LVT sequences whose LV subsequence corresponded to one of the 500 high-frequency archaic hangul syllables.
The image below shows the <1140 1175 11D9> sequence in pre-composed and combining forms, which clearly illustrates that the pre-composed form is more visually appealing, in terms of the balance of the components, than the combining form:
Included in the Source Han Sans project are Glyph Complement PDFs for each weight that show the 500 high-frequency archaic hangul syllables, and provide the two- or three-character jamo sequences that are used to enter them. Below is an excerpt:
I have never implemented archaic hangul prior to developing the Source Han Sans and Noto Sans CJK fonts, so it was a good learning experience for me.
In closing, I should mention that the model for combining jamo is defined in the KS X 1026-1:2007 standard whose (unofficial) English translation is available via WG2 N3422.