Font Development Via Unicode

Unicode has become the de facto way in which to represent text in digital form, and for good reason: its character set covers the vast majority of the world’s scripts. Other benefits of Unicode include the following:

That it is under active and continuous development, meaning that with each new version, more scripts are being supported, and additional characters for existing scripts are being standardized.
That it is aligned and kept in sync with ISO/IEC 10646 (available at no charge), which is quite a feat.

With regard to font development, Unicode is considered the default encoding for OpenType, which refers to the ‘cmap‘ table. The most common ‘cmap’ subtables are Formats 4 (BMP-only UTF-16) and 12 (UTF-32). The latter is used only when mappings outside of the BMP (Basic Multilingual Plane), meaning from one or more of the 16 Supplementary Planes, are used.

According to the AGL (Adobe Glyph List) Specification, which is an open specification that is part of the AGL & AGLFN open source project that is hosted at Open@Adobe, a glyph that is outside of the standard set of named ones should be named according to its Unicode code point, based on the following conventions:

Glyphs that map from BMP code points should use the “uni” prefix followed by the four-digit hexadecimal code point, such as “uni528D” for U+528D (劍).
Glyphs that map from Supplementary Plane code points should use the shorter “u” prefix followed by the five- or six-digit hexadecimal code point.

In my decades of experience in font development, this glyph-naming convention not only simplifies the development of name-keyed fonts, whose glyph names drive the building of an OpenType font’s ‘cmap’ table, but it also aids in the creation of the UTF-32 CMap resource that is used for CID-keyed fonts, because CID-keyed fonts are generally compiled from one or more name-keyed fonts. In both cases, the Unicode mapping is explicit in the glyph name, which is a tremendous benefit. This technique of naming glyphs according to their Unicode code points also scales extraordinarily well.

The bottom line is that if your solution does not make use of Unicode, especially for today’s environments, it’s probably not the right solution.

CJK Type Blog

CJK Fonts, Character Sets & Encodings.

Font Development Via Unicode

By Dr. Ken Lunde

Comments (0)

Created