OpenType ‘cmap’ Table Ramblings

OpenType fonts are ‘sfnt’ (scalable font) resources that are comprised of several well-defined tables. One of these tables, which is the topic of today’s article, is the ‘cmap‘ (character map) table. The ‘cmap’ table, put simply, maps characters codes to Glyph IDs (GIDs) that refer to glyphs in the ‘glyf‘ or ‘CFF‘ (Compact Font Format) table, depending on the “flavor” of the OpenType font. What is important about the ‘cmap’ table is that it makes the glyphs usable. Without the ability to map from character codes, which are used by virtually all applications and OSes, the glyphs in a font are useless, and cannot be readily accessed or used.

There are several ‘cmap’ subtable formats, most of which use even-numbered integers as their identifier. Of these, the most important one is Format 12 (UTF-32), followed closely by Format 4 (BMP-only UTF-16). These represent the most widely-used ‘cmap’ subtable formats. The Format 12 subtable needs to be present only if the font includes mappings outside the BMP. If all of the mappings are within the BMP, only a Format 4 subtable needs to be present.

I consider the Formats 13 and 14 subtable formats to be somewhat special- or limited-purpose. The former is for last-resort fonts that map a range a code points to a single glyph. Apple’s Last Resort Font makes use of the Format 13 subtable format. I have observed that support for this particular ‘cmap’ subtable format is not very broad, so it should be used only as a last resort. <pun intended> The Format 14 subtable format is used for Unicode Variation Sequences (UVSes), such as for Ideographic Variation Sequences (IVSes) that have been registered in an IVD collection. The Format 14 subtable works in conjunction with a Format 4 or 12 subtable. The basic principle is that if a glyph is unencoded, but is represented by a UVS, the Format 14 subtable maps the UVS to the appropriate glyph. This is referred to as a non-default UVS. If the glyph is encoded, but still can be represented by a UVS, the Format 14 subtable specifies only the UVS itself, and defers to the Format 4 or 12 subtable to map the base character portion of the UVS to the appropriate glyph. This is referred to as a default UVS.

Building ‘cmap’ tables depends on the tools that are being used. The AFDKO makeotf tool uses glyph names, in the case of name-keyed fonts, or UTF-32 CMap resources, in the case of CID-keyed fonts, to drive this process. For the former, the rules and conventions are documented in the AGL Specification. For the latter, one simply needs to build a UTF-32 CMap resource that maps UTF-32 character codes to the appropriate CIDs. Adobe Tech Note #5099 (Developing CMap Resources for CID-Keyed Fonts) is a good resource.

Comments are closed.