Okay. It is time to put some “K” into CJK…
Seriously, much of the content of this blog has been focused on Chinese and Japanese issues. This article will provide some much-deserved Korean content.
I spent the last few days coming to grips with Old Hangul (옛한글 yethangeul), specifically how to implement proper shaping using the three registered OpenType GSUB features, ‘ljmo‘ (Leading Jamo Forms), ‘vjmo‘ (Vowel Jamo Forms), and ‘tjmo‘ (Trailing Jamo Forms).
The Malgun Gothic fonts that are bundled with Microsoft Windows 8 implement Old Hangul via these three Korean-specific GSUB features. Font developers should note that Microsoft’s website includes an article entitled Developing OpenType Fonts for Korean Hangul Script that covers the Old Hangul implementation. However, it was written in April of 2003, and some of its content is now outdated. Specifically, references to the ‘ccmp‘ (Glyph Composition/Decomposition) GSUB feature can be ignored, as can the tables that make up Appendix B. Why? The 117 characters that need to be formed by applying the ‘ccmp’ GSUB feature were added to Unicode in Version 5.2 (October of 2009):
- U+115A through U+115E, U+11A3 through U+11A7 & U+11FA through U+11FF (16 characters; Hangul Jamo)
- U+A960 through U+A97C (29 characters; Hangul Jamo Extended-A)
- U+D7B0 through U+D7C6 & U+D7CB through U+D7FB (72 characters; Hangul Jamo Extended-B)
Anyway, Microsoft’s Old Hangul implementation requires 1,309 glyphs. 357 of these glyphs are encoded as follows:
- U+1100 through U+11FF (256 characters; Hangul Jamo)
- U+A960 through U+A97C (29 characters; Hangul Jamo Extended-A)
- U+D7B0 through U+D7C6 & U+D7CB through U+D7FB (72 characters; Hangul Jamo Extended-B)
The remaining 952 glyphs are unencoded, and are accessed via the three GSUB features as follows:
- 625 glyphs that are made accessible via the ‘ljmo’ GSUB feature; these glyphs specify the desired advance of the character, such as 1000 units for a 1000-em font.
- 190 glyphs that are made accessible via the ‘vjmo’ GSUB feature; these glyphs have a zero-unit advance, and are meant to be combined with the previous glyph.
- 137 glyphs that are made accessible via the ‘tjmo’ GSUB feature; like the 190 glyphs for the ‘vjmo’ GSUB feature, these glyphs also have a zero-unit advance.
Let’s look at an example that is composed of the following three characters: U+A97C (ꥼ; HANGUL CHOSEONG SSANGYEORINHIEUH) + U+D7BE (ힾ; HANGUL JUNGSEONG I-YAE) + U+D7EE (ퟮ; HANGUL JONGSEONG SIOS-PANSIOS) or simply <A97C,D7BE,D7EE>. The graphic below shows the three characters in isolation, with no GSUB features being applied:
The next graphic shows proper shaping by applying these three GSUB features:
In order to better assist developers who build CID-keyed OpenType/CFF fonts, I have created a Perl script named mkoldhangul.pl that includes an embedded “features” file template for Old Hangul that uses glyph names. A mapping file, such as glyph-map.txt, serves as STDIN, and simply maps the glyph names to CIDs. For the purpose of this article, the 1,309 glyph names map to CIDs 1 through 1309. The following command line is used:
% mkoldhangul.pl < glyph-map.txt > features
Note that all of the lookups that comprise the three Korean-specific GSUB features can be combined into a single ‘calt‘ (Contextual Alternates) GSUB feature. The template that I created includes the ‘calt’ GSUB feature definition, mainly for good measure. I did this because it seems that more environments support the ‘calt’ GSUB than the three Korean-specific GSUB features. For environments that do support the three Korean-specific GSUB features, the same substitutions in the ‘calt’ GSUB feature will be no-ops (that is, they are harmless and will simply be ignored).
BTW, much of the Old Hangul implementation is described in the KS X 1026-1:2007 (Information Technology — Universal Multiple-Octect Coded Character Set (UCS) — Hangul — Part1: Hangul processing guide for information interchange) standard. It is highly convenient that a courtesy English translation of this standard is included in WG2 N3422.
Now that I better understand how Old Hangul is supported in OpenType, and in the words of Spock, it is logical to consider the 1,309 necessary glyphs as candidates for Adobe-Korea1-3.
LLAP