Posts in Category "Unicode"

ISO/IEC 10646:2012 Published!

ISO/IEC 10646:2012 (Third Edition) was just published. This is the first version of the standard that includes multiple-column Code Charts for Extension B, and for CJK Compatibility Ideographs. Another significant aspect of ISO/IEC 10646:2012 is that it is equivalent to Unicode Version 6.1.

For Adobe, the publishing of this new version of the standard represents a significant milestone, because it means that every Adobe-Japan1-6 kanji is either directly encoded, or is directly associated with a registered IVS in the IVD (Ideographic Variation Database).

Speaking of Unicode Version 6.1, the printed version of the Core Specification is available via POD from Lulu, and at a very attractive price.

Adobe-Japan1-6 Unicode Version 6.1 Tables

Years ago, I wrote a Perl script, called unicode-rows.pl, that takes a fully-qualified PostScript name—composed of a CIDFont resource name, two hyphens, and a UTF-32 CMap resource name—then generates a PostScript file that can be distilled into a PDF. The resulting PDF file is a Unicode table, arranged in groups of 256 code points. If the UTF-32 CMap resource includes even a single mapping for a particular group of 256 code points, a page is created.

I have prepared examples that are based on the UniJIS2004-UTF32-H and UniJIS-UTF32-H CMap resources.
Continue reading…

“All Of Unicode” CFR Object

As alluded to at the end of the May 9, 2012 CJK Type Blog article, I had plans to build additional CFR objects for testing purposes. That particular article supplied two 64K-glyph OpenType/CFF fonts, which provided BMP and Plane 1 coverage, and served as component fonts for the supplied CFR object, UnicodeGetaCFR.cfr. In today’s article, I will supply a CFR object that encompasses all of Unicode, meaning the BMP and the 16 Supplementary Planes, along with the component fonts that it references. In other words, coverage for 1,112,030 code points, each of which has a unique glyph. These represent valuable testing resources for developers who plan to support CFR objects in their products as a way to break the 64K glyph barrier.
Continue reading…

CJK Compatibility Ideographs

Unicode Version 6.1 includes a total of 1,002 CJK Compatibility Ideographs. The February 22, 2012 CJK Type Blog article includes a table that provides the details in terms of when they were added to Unicode, version-wise.

Of the 1,002 CJK Compatibility Ideographs that are in Unicode, 89 have Japanese sources. The Japanese sources are JIS X 0213:2004, Jinmei-yō Kanji (人名用漢字), IBM, and ARIB STD-B24. In addition, some of them have multiple Japanese sources, and while most of them are intended to use the same glyph regardless of the source, a very small number of them—three to be precise—do not.
Continue reading…

IVD Version 2012-03-02 Released

As the IVD Registrar, I am very pleased to announce that a new version of the IVD (Ideographic Variation Database) was released on March 2nd, 2012. It incorporates the results of PRI 183 and PRI 187.
Continue reading…

IUC36

I am pleased to announce that Adobe once again has the privilege and honor of being a Gold Sponsor of the Internationalization & Unicode Conference, the 36th iteration of which will take place in October of this year.

For those who have had the opportunity to attend this conference in the past, I am preaching to the choir when I state that much of the benefit of attending is not from listening to the scheduled sessions—though they have incredible value—but rather that there is an opportunity to have face-to-face discussions with others in the industry.

If you plan to attend IUC36, I hope to see you there!

CJK Unified/Compatibility Ideographs in Unicode Version 6.1

Unicode Version 6.1 was released on 01/31/2012, and now includes 74,617 CJK Unified Ideographs, along with 1,002 CJK Compatibility Ideographs. 732 characters were added, and there are now a staggering 110,116 characters in the standard.

Speaking of staggering, as Unicode grows, it becomes more important to keep track of what character is encoded where, and sometimes it is useful to know when a character was encoded. For this purpose, the DerivedAge.txt datafile is an incredibly useful resource.

In terms of CJK Unified Ideographs and CJK Compatibility Ideographs, I spent part of the morning assembling a single-page PDF file that encapsulates many important details of their history. I hope that readers of this blog find it to be useful.