I like building fonts. I especially like building fonts with a large number of glyphs. Fortunately, my job entails developing OpenType CJK fonts, which means that I need to deal with fonts with thousands or tens of thousands of glyphs.
I built an “extreme” OpenType font last year, and spent the morning making it even more extreme. Given that “extreme” fonts are useful for stress-testing software that consumes fonts, I figured that this post would be a good opportunity to make it available to developers who may benefit by testing with this font.
Did I mention that I like building fonts? ☺
When I first developed this OpenType font, called UnicodeAll, last year, it included the following two “extreme” characteristics:
- The Format 12 (UTF-32) ‘cmap‘ subtable includes mappings for all 1,112,030 Unicode code points. In other words, all of the code points for the BMP (Basic Multilingual Plane) plus the 16 Supplementary Planes, but excluding the 2,048 Surrogates in the BMP, and excluding
0xFFFE
and0xFFFF
in the BMP and in the 16 Supplementary Planes. - The ‘CFF‘ table includes 256 FDArray elements (aka, hint dictionaries), which is the maximum.
This CID-keyed font advertised the special-purpose Adobe-Identity-0 ROS (/Registry, /Supplement, and /Ordering), and included only 257 glyphs. CID+0 acted as the .notdef glyph, and CIDs 1 through 256 used glyphs that correspond to the geta (下駄; U+3013) character, like this (sans registration marks, of course):
CIDs 1 through 256 were simply mapped from UTF-32 values that ended in 0x00
(CID+1) through 0xFF
(CID+256).
The third “extreme” characteristic that I added to this font today was the following:
- The ‘CFF‘ table includes 65,535 glyphs (CIDs 0 through 65534) are included, which is the maximum number of glyphs for a CIDFont resource.
CID+0 continues to act as the .notdef glyph, and CIDs 1 through 65534 map from their respective code points in the range 0x0000
(CID+1) through 0xFFFD
(CID+65534) in the BMP and in each of the 16 Supplementary Planes. The glyphs for CIDs 1 through 65534 are the same, and are still that of the geta character.
I would like to point out that AFDKO tools, specifically tx, mergeFonts, and makeotf, were used to build the CIDFont resource, and the subsequent OpenType font. Of course, a couple of carefully-crafted Perl scripts were used to build some of the datafiles, such as the mergeFonts mapping files and mappings for the CMap resource.
Click here to download this font.
And, enjoy!
I would like to add that the ‘CFF’ table has been subroutinized. The unsubroutinized ‘CFF’ table is 2,569,000 bytes (approximately 2.5MB), and the subroutinized one is only 340,943 bytes (approximately 325K, which is 13% the size of the unsubroutinized version).
And, the ‘cmap’ table, though it covers over one-million code points, is a mere 328 bytes. By contrast, the previous version of this font, which included only 257 glyphs, included a ‘cmap’ table that was 54,208 bytes in size.
That CFF subroutine compression number was just cheating. Since all the glyphs had the same splinelines. Of course a great compression ratio could be achieved.
The 2^16 limit over the number of subroutines make CJK subroutine compression less useful as it sounds.
Thank you for posting a comment.
For a font such as this, where all of the glyphs are the same, it makes perfect sense to subroutinize the ‘CFF’ table. In fact, doing so is not cheating, but rather it is precisely the point of subroutinization. Of course, by cheating, you were referring to the high compression ratio that was achieved.
About the subroutine limit, it has proven to be very useful for CJK fonts. I have built only one font that exceeds that limit, and I found a workaround. Keep in mind that the true subroutine limit is “64K minus 3” (65,533). This applies to each FDArray element (aka, local subroutines), but the global subroutine figure is also included. In other words, the combination of the global subroutines plus those from any one FDArray element cannot exceed the limit.
If a CID-keyed ‘CFF’ table, which is typical for CJK fonts, has a single FDArray element, then the global and local subroutines are the same.
Also note that some implementations have a lower subroutine limit. Mac OS X Panther (10.3) and Tiger (10.4) and possibly earlier versions, along with Distiller Version 7.0 and earlier, were limited to “32K minus 3” (32,765) subroutines.
In terms of working around this limit, simply distributing the number of local subroutines across multiple FDArray elements has the effect of creating more global subroutines, because they are shared across more than one FDArray element. In my experience, a typical CID-keyed OpenType/CFF CJK font has approximately a dozen FDArray elements, and the number of global subroutines is relatively small. The technique is to split an FDArray with a large number of subroutines into two (or more) FDArray elements, which has the effect of forcing some local subroutines to become global ones.
The following AFDKO tx command line displays the global and local subroutines (replace “<file>” with the filename of the font):
% tx -dcf -1 -T gl <file> | grep '^--- FD|^count'