Archive for April, 2012

Never Say Never

In the realm of CJK Unified Ideographs, there is always talk about no more characters to encode, or that any new characters are simply unifiable variants. This is, in large part, merely wishful thinking.

In my experience, there are three important words to embrace: Never Say Never.
Continue reading…

Kazuraki Poster Redux

Almost three years ago, in a September 2009 article on the sister blog, Typblography, we showed a poster for our Kazuraki (かづらき) typeface, which was designed by Ryoko Nishizuka (西塚涼子), who was also its typeface designer. A request came in today for a PDF version of the poster, and instead of posting it into that relatively old (and now buried) article where it would not likely be noticed, I figured that it’d be best to post it here, today.

Click ☞ here ☜ to get the PDF version of the Kazuraki poster.

Enjoy! And for those in Japan, have a safe and enjoyable Golden Week!

Know Your Documentation

For those who develop fonts—professionally or otherwise—it is prudent to know where the latest and greatest documentation is located. This is useful not only when searching for specific documentation, but also to check whether there are any updates to existing documentation.

The right-side navigation bar of this blog’s landing page includes links to relevant documentation and resource pages, such as for font-related Adobe Technical Notes, the OpenType Specification (hosted on Microsoft’s website), and even AFDKO (Adobe Font Development Kit for OpenType). Also, don’t forget about the excellent font developer resources offered by Apple (Fonts), FontLab, and Microsoft (Microsoft Typography).

The AFDKO ‘tx’ Tool

Among the many excellent and powerful tools included in AFDKO (Adobe Font Development Kit for OpenType) is one with a two-letter name: tx. Although it has the shortest name, it is arguably one of the most powerful AFDKO tools.

The tx tool is best thought of as a multi-purpose font-file–manipulation tool. For those who don’t leverage this tool in the font development activities, I strongly encourage you to explore its capabilities, which is best done by perusing its built-in help and through experimentation.
Continue reading…

The All-Important Macron

When transliterating Japanese text using Latin characters, there are three systems or methods for doing so. Of these, the Hepburn system (ヘボン式 hebon shiki) is the most commonly used one, and differs in one important way: long vowels are represented with a macron (U+00AF MACRON or U+0304 COMBINING MACRON) diacritic. Almost all signage in Japan that includes transliterated text, such as in train and subway stations, uses the Hepburn system. However, if we look back to the 1990s and earlier, it was not common to include glyphs for macroned vowels in fonts, whether they were for Latin or Japanese use.

The two other systems, the Kunrei system (訓令式 kunrei shiki) and the Nippon system (日本式 nippon shiki), represent long vowels with a circumflex (U+005E CIRCUMFLEX ACCENT or U+0302 COMBINING CIRCUMFLEX ACCENT) diacritic. It was common for Latin fonts to include glyphs for circumflexed vowels, meaning U+00C2/U+00E2 (Ââ), U+00CA/U+00EA (Êê), U+00CE/U+00EE (Îî), U+00D4/U+00F4 (Ôô), and U+00DB/U+00FB (Ûû), by virtue of being included in ISO/IEC 8859-1 (aka Latin 1). However, due to limitations of Shift-JIS encoding, even Japanese fonts did not include glyphs for these characters.
Continue reading…

ISO/IEC 14496-28:2012 Published

Born from the conclusion that OpenType’s 64K glyph barrier cannot be broken in the context of the format itself, ISO/IEC 14496-28:2012 (Composite Font Representation) was developed, and was subsequently published three days ago, on April 17, 2012, as a new ISO standard. As described in the January 26, 2012 CJK Type Blog article, CID-keyed fonts can include a maximum of 65,535 glyphs (CIDs 0 through 65534). Considering that Unicode Version 6.1 includes over 100K characters, with approximately 75K of which being CJK Unified Ideographs, it becomes immediately apparent that a single font resource cannot support all of Unicode, let alone all of the characters for a single script (referring to CJK Unified Ideographs).
Continue reading…

Adobe-Japan1-6 Radical/Stroke Database

I spent approximately two weeks in August of 2004 developing a radical/stroke database for the 14,664 kanji in Adobe-Japan1-6 (CIDs 656, 1125–7477, 7633–7886, 7961–8004, 8266, 8267, 8284, 8285, 8359–8717, 13320–15443, 16779–20316, and 21071–23057), which is available as a tab-delimited text file that is keyed by Adobe-Japan1-6 CIDs, and as a PDF file that is keyed by indexing radical, then by the number of strokes of the indexing radical instance, followed by the number of remaining strokes, and finally by Adobe-Japan1-6 CID.
Continue reading…

CID vs GID (Cont’d)

In yesterday’s CJK Type Blog post, I introduced and provided three Perl tools for listing the CIDs and GIDs in font resources: extract-cids.pl, extract-gids.pl, and mkrange.pl. In my continued effort to provide font developers with useful tools, I spent a few minutes this morning to enhance two of these tools, specifically extract-cids.pl and mkrange.pl.
Continue reading…

CID vs GID

When working with OpenType/CFF fonts, particularly those that are CID-keyed, CIDs (Character IDs) and GIDs (Glyph IDs) are often referenced as ways to uniquely identify glyphs in a font resource. But, how are CIDs and GIDs different, and perhaps more importantly, under what circumstances are they different, or the same? These are good questions, and the answers can be found in today’s article.
Continue reading…

Advantages of Numeric Character References

Unicode has become the preferred way in which to represent text in digital form, and for good reason. Its broad coverage of our planet’s scripts and languages is the single greatest reason why this has happened. All of the major OSes have embraced Unicode. In other words, if you develop a product that makes use of text data, and if it doesn’t support Unicode, you’re doing something wrong.

Unicode comes in a variety of representations called encoding forms. The three most basic Unicode encoding forms are UTF-8, UTF-16, and UTF-32. The latter two are also available in explicit little- or big-endian flavors: UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. These are covered in Chapter 4 of CJK Information Processing, Second Edition. But, there are times when a bomb-proof way of representing Unicode characters is needed, or when an otherwise ASCII-only web document requires the occasional Unicode characters. For these purposes, and in the context of web documents, Numeric Character References (aka, NCRs) have great advantages. One of the advantages is its human-readability in terms of conveying an explicit Unicode code point. Another advantage is that only ASCII characters are used for this notation, which is its bomb-proof aspect.
Continue reading…