One of my longer term goals for the open source Source Han Sans project has been to eventually register a Pan-CJK IVD (Ideographic Variation Database) collection that would allow the regional variants to display and be preserved in “plain text” environments, and I think that I may have achieved a breakthrough the other day.
One of the hurdles toward this goal has been related to the extent to which glyphs are shared—or not shared—across regions, which can vary based on the typeface style and typeface design. This would potentially tie a Pan-CJK IVD collection to a particular Pan-CJK implementation, which means multiple and potentially conflicting Pan-CJK IVD collections. The idea that I came up with would involve registering a general-purpose Pan-CJK IVD collection that could potentially serve any Pan-CJK implementation by registering separate IVSes for each region and for each CJK Unified Ideograph.
In order to make this idea more broadly visible, and to solicit feedback, I launched the PanCJKV IVD Collection open source project this morning, which includes a fully-functional example implementation based on Source Han Sans.
The image above, which may look familiar to those who have read my materials that describe Source Han Sans, shows four prototypical characters that exhibit varying degrees of regional variation, displayed using a Source Han Sans font whose Format 14 ‘cmap‘ subtable includes UVSes (Unicode Variation Sequences) that correspond to this experimental IVD collection:
<U+4E00,U+E01E8> <U+4E00,U+E01E9> <U+4E00,U+E01EC> <U+4E00,U+E01ED>
<U+5B57,U+E01E8> <U+5B57,U+E01E9> <U+5B57,U+E01EC> <U+5B57,U+E01ED>
<U+9AA8,U+E01E8> <U+9AA8,U+E01E9> <U+9AA8,U+E01EC> <U+9AA8,U+E01ED>
<U+66DC,U+E01E8> <U+66DC,U+E01E9> <U+66DC,U+E01EC> <U+66DC,U+E01ED>
What is significant about the glyphs that are shown in the image above is that a single font is being used, and language tagging is not used. The advantage is that if such text were to be copied, which almost always involves “plain text” and thus results in the loss of any language tagging, the IVSes are preserved.
The table below lists the eight VSes that would be completely consumed by this IVD collection, their code points, and to which region they correspond:
|Variation Selector||Code Point||Region|
|VS251||U+E01EA||HK (Hong Kong SAR)|
|VS252||U+E01EB||MO (Macau SAR)|
|VS256||U+E01EF||VN (Việt Nam)|
Please keep in mind that this IVD collection is experimental and unregistered, and any and all feedback is welcome.