Leveraging AFDKO Tools to Convert Name-keyed OpenType Fonts to CID-keyed — Part 1

The easiest method for representing an arbitrary name-keyed OpenType font as a CID-keyed one is to specify the special-purpose Adobe-Identity-0 ROS (/Registry, /Ordering, and /Supplement, referring to the three elements of the /CIDSystemInfo dictionary that is present in CIDFont resource headers), and in my experience, the easiest path to conversion is to leverage specific AFDKO tools, such as tx, mergeFonts, stemHist, autohint, and makeotf. As you should discover after reading this article, the conversion process is relatively straight-forward and simple.

The first part in this series will focus on the basic conversion process, from name-keyed to CID-keyed, ignoring any OpenType features that were present in the original name-keyed OpenType font, and also not taking advantage of multiple FDArray elements (aka, hint dictionaries) that are possible in CID-keyed fonts. Subsequent parts in this series will cover those topics.

For the purpose of this tutorial-like article, I crafted a simple name-keyed OpenType kana font, KozGoKanaNK-Heavy.otf, as the basis of this conversion. Click here to download an archive that includes the various files and resources that are referenced in this article, including this name-keyed OpenType font. The intent of the files is to enable interested readers to follow along, and repeat the production steps that are outlined below.

Step 1: Creating the Mapping Table

The first step is to create a mapping table that will serve to simplify some of the subsequent steps—including some that are not covered in this article—and this is done by first extracting a list of the glyphs that are present in the font, which includes their GIDs (Glyph IDs), glyph names, and Unicode mapping (if one exists). This is done by using the AFDKO tx tool, and specifying the “-1” option, as follows:

% tx -1 KozGoKanaNK-Heavy.otf > raw.txt

For the purpose of conversion, only the lines after the “## glyph[tag] {name,encoding,LanguageGroup}” line are useful. The resulting file must be processed so that the results are three tab-delimited fields. Simple or regex-based search/replace can be used to accomplish this task. The “LanguageGroup” field should be removed. Consider the following line from the raw tx output:

glyph[1] {space,U+0020,1}

After processing, this line should become the following:

1 space U+0020

I also recommend that the order of the fields be swapped such that the third one (Unicode mapping) becomes first, as follows:

U+0020 1 space

This is because the file should be sorted according to Unicode in preparation for CID-keyed conversion, and having it as the first field can facilitate the sorting process, depending on which tool is used for sorting. The process of sorting involves some amount of human intervention and decision-making. Note that the “.notdef” glyph must remain as the very first line, because it needs to become CID+0, as follows:

- 0 .notdef

For the remaining lines, my personal preference is to first specify the encoded glyphs (those with a Unicode mapping in the first field), followed by all unencoded glyphs (those with a hyphen in the first field). If a Unicode mapping is known for the unencoded glyphs, such as because they are variants of encoded glyphs (in this particular font, all of the unencoded glyphs happen to be vertical variants), my personal preference is to ordering them according to that order, mainly for consistency with the glyph order of the to-be-encoded glyphs. After the glyph order is determined, the next step is to add a fourth field, which is a simple enumeration from 0 (zero) to the number of glyphs minus 1 (one). The sample font includes 291 glyphs, so the enumeration is thus 0 through 290. I use the following simple Perl script for this purpose:

#!/usr/bin/perl

$begin = $ARGV[0];
$end = $ARGV[1];

foreach $num ($begin .. $end) {
print STDOUT "$num\n";
}

This Perl script, named mkdec.pl, is executed as follows:

% mkdec.pl 0 290 > cids.txt

I recommend that this fourth field be inserted such that it becomes the first field. The Unix paste command does this quite nicely, as follows:

% paste cids.txt raw.txt > final.txt

The “final.txt” file encapsulates the final result of this step. This file will be important for some of the remaining steps, and for subsequent articles in this series.

Step 2: Name-keyed to CID-keyed Conversion

There are three files that are necessary to convert a name-keyed font into a CID-keyed one: “cidfontinfo” file, AFDKO mergeFonts tool mapping file, and the name-keyed font itself.

The CJK Type Blog article entitled The “cidfontinfo” File shows how to create a “cidfontinfo” file.

The mergeFonts mapping file is created by simply extracting the first (CID) and fourth (glyph name) fields from the “final.txt” file that was prepared in Step 1. The Unix cut command is useful for this purpose, as follows:

% cut -f1,4 final.txt > map.txt

The only treatment that this file needs is that its very first line must contain only the text “mergeFonts” as its content.

The name-keyed font is extracted from the name-keyed OpenType font by using the following AFDKO tx tool command line:

% tx -t1 KozGoKanaNK-Heavy font.pfa

With all three files prepared, the following AFDKO mergeFonts tool command line will produce a well-formed CIDFont resource named “cidfont.raw”:

% mergeFonts -cid cidfontinfo cidfont.raw map.txt font.pfa

The following AFDKO tx tool command line will produce a glyph synopsis as a PDF file named “cidfont.pdf”:

% tx -pdf cidfont.raw cidfont.pdf

This functionality is useful for checking a font resources at various stages of production, to ensure that it is still well-formed. And, in case you haven’t realized, the AFDKO tx tool is quite powerful. I encourage you to spend some time exploring its broad functionality by specifying its “-u” option, and go from there.

Step 3: Hinting the CIDFont Resource

If the hinting parameters that are specified in the CIDFont resource are as intended—referring to the /BlueValues, /StdHW, /StdVW, /StemSnapH, /StemSnapV, and /LanguageGroup settings—the AFDKO autohint tool can be executed immediately, as follows, which will produce a fully-hinted CIDFont resource named “cidfont.ps”:

% autohint -a -o cidfont.ps cidfont.raw

Otherwise, those settings should be adjusted to appropriate values. The header of a CIDFont resource is plain text, followed by a (much larger) binary portion. You must therefore use a binary-friendly text editor for the purpose of editing the hinting parameters so as not to damage or corrupt the binary portion of the CIDFont resource. My preference is to use emacs in the Mac OS X Terminal application, but BBEdit should work equally as well. If the font consists primarily of non-Latin glyphs, such as for kana, ideographs (aka, kanji, hanzi, or hanja), and hangul, the following /BlueValues array is recommended, because it places the (required) alignment zones well outside the imaging area for those glyphs:

/BlueValues [-250 -250 1100 1100] def

Also, the /LanguageGroup setting should be 1 (one) for such fonts:

/LanguageGroup 1 def

Lastly, and at a minimum, the /StdHW and /StdVW values should be set to appropriate values. The AFDKO stemHist tool can be used for this purpose, specifically to determine the highest-frequency horizontal and vertical stems, which serve as reasonable default hinting parameters:

% stemHist -all cidfont.raw

The resulting “cidfont.raw.hstm.txt” and “cidfont.raw.vstm.txt” files can be examined to determine the highest frequency stems, or you can use the AFDKO setsnap.pl tool for this purpose:

% setsnap.pl

These would result in the following /StdHW and /StdVW settings:

/StdHW [144] def
/StdVW [40] def

When adding hinting parameters to the /Private dictionary of an FDArray element, which follows the "%ADOBeginPrivateDict" line, be sure to adjust the number of declared /Private dictionary entries as appropriate:

/Private 13 dict dup begin

Setting hinting parameters, especially optimal ones, is more of an art than a science, or at least it often feels that way. Once the hinting parameters have been set to appropriate (or reasonable or desired) values, the AFDKO autohint tool can be executed as shown above. The hinting parameters can be checked by executing the following AFDKO tx tool command line, whose result is also shown:

% tx -0 cidfont.ps
## Filename cidfont.ps
## Top Dict
Notice "Kozuka Gothic is either a registered trademark or trademark of Adobe Systems Incorporated in the United States and/or other co"
FullName "Kozuka Gothic Kana AI0 OpenType Heavy"
FontBBox {0,-120,1000,880}
XUID {1,11,9273857}
cid.CIDFontName "KozGoKanaAI0-Heavy"
cid.Registry "Adobe"
cid.Ordering "Identity"
cid.Supplement 0
cid.CIDFontVersion 1.000
cid.CIDCount 291
sup.flags 0x00000001 (ABF_CID_FONT)
sup.srcFontType Type 1 (cid-keyed)
sup.nGlyphs 291
## FontDict[0]
FontName "KozGoKanaNK-Heavy"
## Private
BlueValues {-250,-250,1100,1100}
StdHW 113
StdVW 153
StemSnapH {40,113,139}
StemSnapV {40,153}
LanguageGroup 1

Step 4: Building the CMap Resource

Extracting and swapping the first two fields of the "final.txt" file that resulted from Step 1 provides the basic information for building the CMap resource. For future extensibility, I recommend that a UTF-32 CMap resource be built. The "cmap.raw" file represents a CMap resource shell that contains no mappings but provides the appropriate syntax, and the "cmap.txt" file represents the CMap resource that I built for this font.

The Unicode mapping is changed by removing the "U+" prefix, zero-padding to eight digits, and enclosing in angled brackets. In other words, "U+0020" becomes "00000020." Glyphs without a Unicode mapping, meaning that the field is a hyphen, should simply be removed, because they are not directly encoded. The CID is used as-is. These mappings are placed between the "0 begincidchar" and "endcidchar" lines of the CMap resource shell, and the 0 (zero) before "begincidchar" should be changed to accurately reflect the total number of mappings, which is 261 for this sample font.

Step 5: Create "FontMenuNameDB" and "features" Files

Before the AFDKO makeotf tool can be used to build a fully-functional OpenType font, the "FontMenuNameDB" and "features" files must be created. The former provides menu name information (see Adobe Tech Note #5149: OpenType-CID/CFF CJK Fonts: 'name' Table Tutorial for more details), and the latter provides table overrides and other settings (see the AFDKO documentation for more details). These files deserve their own articles, so for the purpose of expediency I simply prepared them as examples for readers to study.

Step 6: Building the CID-keyed OpenType Font

Now that all of the files and resources have been prepared, the AFDKO makeotf tool can now be used to build a fully-functional CID-keyed OpenType font. The following command line can be used:

% makeotf -f cidfont.ps -cs 1 -r -ch cmap.txt

The "FontMenuNameDB" and "features" files are not specified, because they use filenames that are standardized and are in the current working directory. If desired, these files can be explicitly specified, as follows:

% makeotf -f cidfont.ps -mf FontMenuNameDB -ff features -cs 1 -r -ch cmap.txt

This CID-keyed OpenType font can be installed and used with various applications.

In summary, I'd like to state that the name-keyed to CID-keyed conversion process that is described above easily scales to fonts with thousands or even tens of thousands of glyphs.

Please stay tuned for the next article in this series…

Comments are closed.