“Houston, we have a problem… …with U+4548”

Recent work has led me to more closely explore U+4548 (☞䕈☜), which is in CJK Unified Ideographs Extension A. (What is shown in parentheses in the previous sentence is likely to be different than what is shown in the excerpt above.)

The image above is an excerpt from the latest Extension A Code Charts. At first glance, everything seem normal. The differences between the G (China) and T (Taiwan) glyphs are expected, and perhaps more importantly, unifiable.

However…
Continue reading…

JIS X 0213 versus kIRG_JSource—Redux

In the previous article I mentioned that 85 kanji that correspond to JIS X 0213:2004 currently have kIRG_JSource JA source references, but I made no mention about possible glyph differences between what is shown in the Code Charts and JIS X 0213:2004. I found at least seven kanji, among these 85, that have significant glyph differences between these two Japanese sources. I prepared this table that shows these glyph differences, by using excerpts from the Extension A code charts for the kIRG_JSource JA glyphs and Heisei Mincho W3 (平成明朝W3) for the JIS X 0213:2004 glyphs.

JIS X 0213 versus kIRG_JSource

To continue yesterday’s article about different prototypical glyphs for Unicode code points that are common between JIS X 0212-1990 and JIS X 0213:2004, today’s article will focus on the normative references that correspond to JIS X 0213:2004, or rather the lack thereof.


Continue reading…

JIS X 0212 versus JIS X 0213

Most Japanese font developers are—perhaps painfully—aware of the 168 kanji whose prototypical glyphs changed in 2004 via the JIS X 0213:2004 standard. What is not broadly known are those kanji whose prototypical glyphs are different between JIS X 0212-1990 and JIS X 0213 (both versions).

JIS X 0212-1990 was established in 1990, and included 5,801 kanji in a single block. JIS X 0213:2000 was established a full ten years later, and included 3,685 kanji in two levels (1,249 kanji in Level 3, and 2,436 in level 4). Ten additional kanji were added in JIS X 0213:2004, bringing the total to 3,695. When the Unicode code points that correspond to these two JIS standards are compared, 2,743 of them are common, 3,058 are specific to JIS X 0212-1990, and 952 are specific to JIS X 0213:2004.

Interestingly, when the prototypical glyphs of the 2,743 kanji that are in common—in terms of having a shared Unicode code point—are compared, 31 of them are different. I prepared a single-page table that shows the differences using genuine Heisei Mincho W3 (平成明朝W3) glyphs, which also provides Adobe-Japan1-6 CIDs for all but three of the JIS X 0212-1990 prototypical glyphs (these three glyphs are thus candidates for Adobe-Japan1-7). Also, all of the JIS X 0213 kanji are from the original 2000 version, except for the one that corresponds to U+7626 that was introduced in 2004. This character’s entry is shaded in the PDF.

Unicode Beyond-BMP Top Ten List—2014 Redux

It’s hard to imagine that it has been nearly three years since I posted the always-enjoyable Unicode Beyond-BMP Top Ten List, so I figured that an updated version, which takes into account developments that have transpired since then, was in order for the current year of 2014.

Enjoy!

Adobe Blank AJ16

Although today is April 1st, this is actually a brief non-joke article. Honestly and truly. (However, I cannot say the same about Toshiya SUZUKI’s WG2 N4572. ☺)

The background is that during my last visit to Japan, which was mainly to attend IRG #41 in Tokyo during the latter half of November of 2013, Kunihiko OKANO (岡野邦彦) requested an Adobe-Japan1-6 version of Adobe Blank during a dinner at a restaurant called かつ吉. The purpose of such a font is to serve as a template for font development purposes, meaning that its structure—in terms of ‘sfnt’ tables, FDArray elements, and number of glyphs (CIDs 0 through 23057)—is identical to a genuine Adobe-Japan1-6 font, but that all of its functional glyphs are non-spacing and blank, like Adobe Blank.

I am pleased to announce that the Adobe-Japan1-6 version of Adobe Blank, called Adobe Blank AJ16, is now available in the Downloads section of the open source project, specifically in the AJ16 subdirectory. Of course, this font is not intended to be installed and used in applications, but rather to be opened or inspected by font development tools.

Okano-san also requested Adobe-Japan1-3, Adobe-Japan1-4, and kana subset versions, which will soon be added to the “Adobe Blank OpenType Font” open source project.

China’s 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo)

As the title makes blatantly obvious, today we will cover a topic about China (中华人民共和国 zhōnghuá rénmín gònghéguó).
Continue reading…

IDS + OpenType: Pseudo-encoding Unencoded Glyphs


For those who are not aware, there are twelve IDCs (Ideographic Description Characters) in Unicode, from U+2FF0 through U+2FFB, that are used in IDSes (Ideographic Description Sequences) which are intended to visually describe the structure of ideographs by enumerating their components and arrangement in a hierarchical fashion. Any Unicode character can serve as a IDS component, and the IDCs describe their arrangement. The IRG uses IDSes as a way to detect potentially duplicate characters in new submissions. All existing CJK Unified Ideographs have an IDS, and new submissions require an IDS.

This article describes a technique that uses IDSes combined with OpenType functionality to pseudo-encode glyphs that are unencoded or not yet encoded. If memory serves, it was Taichi KAWABATA (川幡太一) who originally suggested this technique.
Continue reading…

From The Archives: JIS2004 CMap Resource History

[For those who are interested in reading my own release notes for the Adobe-Japan1-6 UTF-32 CMap resource history, which includes the non-JIS2004 ones, I made them available here on January 20, 2016.]

I was recently asked, indirectly via Twitter, about changes and additions that were made to our JIS2004-savvy CMap resources, specifically UniJIS2004-UTF32-H and UniJISX02132004-UTF32-H. The former also includes UTF-8 (UniJIS2004-UTF8-H) and UTF-16 (UniJIS2004-UTF16-H) versions that are kept in sync with the master UTF-32 version by being automagically generated by the CMap resource compiler (and decompiler), cmap-tool.pl, which I developed years ago.

Of course, all of these CMap resources also have vertical versions that use a “V” at the end of their names in lieu of the “H,” but in the context of OpenType font development, the vertical CMap resources are virtually unused and worthless because it is considered much better practice to explicitly define a ‘vert‘ GSUB feature for handling vertical substitution. In the absence of an explicit definition, the AFDKO makeotf tool will synthesize a ‘vert’ GSUB feature by using the corresponding vertical CMap resources.

With all that being said, what follows in this article is a complete history of these two CMap resources, which also assign dates, and sometimes notes, to each version.
Continue reading…

OpenType Collections—Redux

As described in last month’s article, our tools engineer developed two Python scripts for assembling and disassembling ‘sfnt’ collections, both of which operate on TrueType-based source fonts to produce a traditional TrueType Collection (TTC) font or to break apart one, but also operate on CFF-based source fonts to produce a new font species known as an OpenType Collection (OTC).

The purpose of this follow up article is to convey the news that these scripts have been tweaked slightly, and have been included in a new version of AFDKO that was released on 2014-02-18 as Build 61250. One of the benefits of the integration with AFDKO is that the tools are now easier to run, as a simple command.
Continue reading…