Author Archive: Dr. Ken Lunde

The All-Important Macron

When transliterating Japanese text using Latin characters, there are three systems or methods for doing so. Of these, the Hepburn system (ヘボン式 hebon shiki) is the most commonly used one, and differs in one important way: long vowels are represented with a macron (U+00AF MACRON or U+0304 COMBINING MACRON) diacritic. Almost all signage in Japan that includes transliterated text, such as in train and subway stations, uses the Hepburn system. However, if we look back to the 1990s and earlier, it was not common to include glyphs for macroned vowels in fonts, whether they were for Latin or Japanese use.

The two other systems, the Kunrei system (訓令式 kunrei shiki) and the Nippon system (日本式 nippon shiki), represent long vowels with a circumflex (U+005E CIRCUMFLEX ACCENT or U+0302 COMBINING CIRCUMFLEX ACCENT) diacritic. It was common for Latin fonts to include glyphs for circumflexed vowels, meaning U+00C2/U+00E2 (Ââ), U+00CA/U+00EA (Êê), U+00CE/U+00EE (Îî), U+00D4/U+00F4 (Ôô), and U+00DB/U+00FB (Ûû), by virtue of being included in ISO/IEC 8859-1 (aka Latin 1). However, due to limitations of Shift-JIS encoding, even Japanese fonts did not include glyphs for these characters.
Continue reading…

ISO/IEC 14496-28:2012 Published

Born from the conclusion that OpenType’s 64K glyph barrier cannot be broken in the context of the format itself, ISO/IEC 14496-28:2012 (Composite Font Representation) was developed, and was subsequently published three days ago, on April 17, 2012, as a new ISO standard. As described in the January 26, 2012 CJK Type Blog article, CID-keyed fonts can include a maximum of 65,535 glyphs (CIDs 0 through 65534). Considering that Unicode Version 6.1 includes over 100K characters, with approximately 75K of which being CJK Unified Ideographs, it becomes immediately apparent that a single font resource cannot support all of Unicode, let alone all of the characters for a single script (referring to CJK Unified Ideographs).
Continue reading…

Adobe-Japan1-6 Radical/Stroke Database

I spent approximately two weeks in August of 2004 developing a radical/stroke database for the 14,664 kanji in Adobe-Japan1-6 (CIDs 656, 1125–7477, 7633–7886, 7961–8004, 8266, 8267, 8284, 8285, 8359–8717, 13320–15443, 16779–20316, and 21071–23057), which is available as a tab-delimited text file that is keyed by Adobe-Japan1-6 CIDs, and as a PDF file that is keyed by indexing radical, then by the number of strokes of the indexing radical instance, followed by the number of remaining strokes, and finally by Adobe-Japan1-6 CID.
Continue reading…

CID vs GID (Cont’d)

In yesterday’s CJK Type Blog post, I introduced and provided three Perl tools for listing the CIDs and GIDs in font resources: extract-cids.pl, extract-gids.pl, and mkrange.pl. In my continued effort to provide font developers with useful tools, I spent a few minutes this morning to enhance two of these tools, specifically extract-cids.pl and mkrange.pl.
Continue reading…

CID vs GID

When working with OpenType/CFF fonts, particularly those that are CID-keyed, CIDs (Character IDs) and GIDs (Glyph IDs) are often referenced as ways to uniquely identify glyphs in a font resource. But, how are CIDs and GIDs different, and perhaps more importantly, under what circumstances are they different, or the same? These are good questions, and the answers can be found in today’s article.
Continue reading…

Advantages of Numeric Character References

Unicode has become the preferred way in which to represent text in digital form, and for good reason. Its broad coverage of our planet’s scripts and languages is the single greatest reason why this has happened. All of the major OSes have embraced Unicode. In other words, if you develop a product that makes use of text data, and if it doesn’t support Unicode, you’re doing something wrong.

Unicode comes in a variety of representations called encoding forms. The three most basic Unicode encoding forms are UTF-8, UTF-16, and UTF-32. The latter two are also available in explicit little- or big-endian flavors: UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. These are covered in Chapter 4 of CJK Information Processing, Second Edition. But, there are times when a bomb-proof way of representing Unicode characters is needed, or when an otherwise ASCII-only web document requires the occasional Unicode characters. For these purposes, and in the context of web documents, Numeric Character References (aka, NCRs) have great advantages. One of the advantages is its human-readability in terms of conveying an explicit Unicode code point. Another advantage is that only ASCII characters are used for this notation, which is its bomb-proof aspect.
Continue reading…

The BlueValues Array & Setting Its Values

When hinting name- or CID-keyed fonts, appropriate hinting parameters are required. One of these parameters are alignment zones whose purpose is to snap shapes to pixel boundaries. Alignment zones are specified in the required /BlueValues array, and also in the optional /OtherBlues array.

The required /BlueValues array is specified in the /Private dictionary of name-keyed fonts, and in the /Private dictionary of each FDArray element of CID-keyed fonts. The purpose of this array is to specify alignment zones that are at the baseline or above, such as for the baseline, x-height, and cap-height. The optional /OtherBlues array is used to specify alignment zones that are below the baseline, such as for the descender. This article will demonstrate how the AFDKO stemHist tool can be used to determine appropriate alignment zone values.
Continue reading…

ATypI Hong Kong 2012

I am extraordinarily pleased that the upcoming ATypI (Association Typographique Internationale) conference will be held in Hong Kong: ATypI Hong Kong 2012. The dates are October 10th through the 14th, 2012, and the theme is between black and white (墨 in Chinese). For font developers who relish at the thought of discussing font-related issues and ideas with others in the same industry, the annual ATypI conference represents a unique opportunity. And, given its venue for this year’s iteration, a larger-than-usual number of CJK font developers are likely to attend, and the number of CJK-related presentations and workshops should be greater than usual.

In any case, I am planning to attend and present at this conference, and very much look forward to meeting other CJK font developers there.

Always Check Your Outlines

The photo below, which was recently taken by my long-term Adobe colleague Dirk Meyer in Beijing, serves as a not-so-gentle reminder that intersecting outlines can result in very obvious printing errors:

The photo depicts the two ideographs 出口, which represent the word meaning exit. The glyphs are obviously designed through the use of components whose outlines necessarily intersect, and under some circumstances—including the circumstance that led to the printing of this signage—can result in a negative or reverse fill.
Continue reading…

Not One, But Three, IVD Code Charts

Thanks to an excellent suggestion from Taichi Kawabata (川幡太一), the 2012-03-02 version of the IVD (Ideographic Variation Database) includes three IVD Code Charts, which were released today. The two earlier versions of the IVD—2007-12-14 and 2010-11-14—included only one IVD Code Chart, named IVD_Charts.pdf.
Continue reading…