Combining Jamo Test #3

Unlike the first and second similarly-titled articles that I published last month, this article will focus on a minor efficiency for the combining jamo feature of the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK Pan-CJK typeface families.

Combining jamo is best described as the use of two- or three-character sequences of nominal (aka encoded) glyphs for jamo that are substituted with appropriate forms that combine into the typical box-like syllable through the use of the 'ljmo' (Leading Jamo Forms), 'vjmo' (Vowel Jamo Forms), and 'tjmo' (Trailing Jamo Forms) GSUB features.

The 2016-11-07 through 2016-11-30 entries of Source Han Sans Issue #98 involve a discussion that may lead to a modest reduction of seven or eight glyphs, specifically all six 'ljmo' forms of U+115F HANGUL CHOSEONG FILLER and one of the two 'vjmo' forms of U+1160 HANGUL JUNGSEONG FILLER, and possibly the nominal form of U+1160 HANGUL JUNGSEONG FILLER itself.

I built three test fonts using Source Han Sans ExtraLight. One includes only the glyphs necessary for combining jamo, along with the KR version of space (U+0020 SPACE) for good measure, and because Korean uses spaces to separate words: CombiningJamoTestAll-ExtraLight.otf. This font serves as the benchmark for the current combining jamo implementation. Another font removes all eight glyphs referenced in the previous paragraph, and adjusts the 'cmap' table and GSUB features accordingly: CombiningJamoTest-ExtraLight.otf. The last font retains the glyph that maps to the nominal form of U+1160 HANGUL JUNGSEONG FILLER itself, and adjusts the GSUB features accordingly: CombiningJamoTest1160-ExtraLight.otf. I have strong reservations about double-mapping the nominal forms of U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to the same glyph, which revolve around copy&paste and the repurposing of PDFs that contain combining jamo.

In addition to the fonts themselves, I also prepared a text file that contains all 30,222 two- and three-character sequences that include U+115F HANGUL CHOSEONG FILLER or U+1160 HANGUL JUNGSEONG FILLER. Because the scope of this change does not require testing all possible two- and three-character sequences, whose total is 1,638,750 sequences, broken down as 11,875 two-character ones plus 1,626,875 three-character ones, this (much) smaller test file is both more manageable and complete.

While I plan to perform my own testing, any feedback from the community would be greatly appreciated.


Comments are closed.