Excruciating details about the Adobe Tech Note #5079 update

I spent the early part of this week updating Adobe Tech Note #5079 (The Adobe-GB1-5 Character Collection). The number of glyphs remained the same (30,284), as did the glyphs themselves. So, why the update? Well, mainly to bring it in line, format-wise, with the other three related Adobe Tech Notes: #5078 (The Adobe-Japan1-6 Character Collection), #5080 (The Adobe-CNS1-6 Character Collection), and #5093 (The Adobe-Korea1-2 Character Collection). The biggest effort was to create its 61-page glyph table. Besides announcing the update, building the glyph table is the substance of this blog post.

When I embarked on updating Adobe Tech Notes #5080 and #5093 earlier this month, the task was relatively painless, because I was able to use the 47-page glyph table of Adobe Tech Note #5078 as the starting point. (Adobe-CNS1-6 and Adobe-Korea1-2 include approximately 4K fewer glyphs than Adobe-Japan1-6.) In other words, I leveraged an existing resource. I simply trimmed the glyph table down to the appropriate number of rows and changed the font used to render the glyphs. However, the Adobe-GB1-5 character collection has approximately 7K more glyphs than the Adobe-Japan1-6 character collection, so I was forced to recreate the glyph table from scratch. My tools were Perl and Adobe InDesign (CS5.5).

Because I need to guarantee that the glyph for each CID (Character ID) is the correct one, I needed to create the data as InDesign Tagged Text, which is a markup language used by Adobe InDesign. There is special syntax for specifying glyphs by CID. The following is used to specify CID+23058:

<cSpecialGlyph:23058><cSpecialGlyph:>

I used the following Perl script to create the InDesign Tagged Text data:

#!/usr/bin/perl

$max = $ARGV[0];
$count = 1;

print STDOUT "<SJIS-MAC>\n<ParaStyle:\>";
print STDOUT "\t0\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\t13\t14\t15\t16\t17\t18\t19\n";

foreach $cid (0 .. $max) {
    print STDOUT "$cid" if $count == 1;
    print STDOUT "\t<cSpecialGlyph:$cid><cSpecialGlyph:>";
    $count++;
    print STDOUT "\n" and $count = 1 if $count > 20;
}

print STDOUT "\n";

I simply specified “65534” as the argument to create an InDesign Tagged Text file that covers all possible CIDs (0 through 65534), as follows:

% mkcidtable-indd-tt.pl 65534 > data-65534.txt

The next step is to import the InDesign Tagged Text file, data-65534.txt, into an InDesign document, then format it as a multiple-page table. Note that each CID that is specified is surrounded by U+230D and U+230C characters. I built a special-purpose (name-keyed) OpenType font that includes non-spacing glyphs for registration marks, which I encoded using these code points. I’d rather not enter 65,535 pairs of registration marks manually, so they are included as part of the InDesign Tagged Text file.

Other than formatting the table, I needed to create the following four named Character Styles: GlyphTableFont, GlyphTableFontInvisible, Tombo, and TomboInvisible. The Character Styles that include “Invisible” in the name have no color specified, meaning that the characters do not display, and are thus are invisible. The “Tombo” Character Style specifies red as the character color. BTW, tombo (トンボ) is the Japanese word for registration mark. Once I have a fully-formatted table, I create a second layer called “Registration Marks,” and copy the entire table to that layer. I now have the same table in two layers. In the “Main Text” layer, I apply the “GlyphTableFont” Character Style to the 65,535 glyphs. This is easily done by searching for “” using the Find/Replace feature, and applying the Character Style. The same is done in the “Registration Marks” layer, but the “GlyphTableFontInvisible” Character Style is applied instead. The registration marks are handled in a similar way, by searching for “” and “,” then applying the “TomboInvisible” and “Tombo” Character Styles in the “Main Text” and “Registration Marks” layers, respectively.

When exporting the document to PDF, be sure that the PDF version is 1.5 or greater, which is the first version to support layers, and to select the option to include the layers. For these glyph tables, I lock the main layer, because it doesn’t make much sense to allow readers to make that portion of the document invisible. The “Registration Marks” layer is the interesting one, because toggling it on and off will make the registration marks appear and disappear.

What I ended up with is a multiple-page two-layer InDesign table that can be repurposed in the future, and is not bound to any specific ROS. Given that the maximum CID is 65534, I saved myself a ton of work by building a table that encompasses all 65,535 possible CIDs.

My first use of this table was to update Adobe Tech Note #5079, which was done quickly and easily.

Comments are closed.