Introducing & Building OpenType Collections (OTCs)

I would like to use this opportunity to introduce two new things.

First, OpenType Collections. TrueType Collections have been around for many years, and are commonplace for OS-bundled fonts. What I am speaking of are ‘sfnt’ Collections that include a ‘CFF ‘ (PostScript charstrings) table rather than a ‘glyf‘ (TrueType charstrings) one. The advantage of an ‘sfnt’ Collection is that fonts that differ in minor ways can be combined into a single resource, which can provide substantial size savings.

Second, brand new AFDKO tools, in the form of two Python scripts, for building, breaking apart, and displaying a synopsis of an OTC’s tables. These scripts were developed by our incredibly talented font tools engineer, Read Roberts, so all thanks should go to him for preparing them.
Continue reading…

GB 18030 Oddity or Design Flaw?

I spent a couple of days curling up with GB 18030 (both versions: 2000 and 2005), which is PRC’s latest and greatest national character set standard, and came across an oddity that my gut tells me is a design flaw. At the very least, it is an issue about which font developers need to be aware.

What I found were eight instances of CJK Unified Ideographs with a left-side Radical #130 that uses the Traditional Chinese or Taiwan-style form, instead of the expected Simplified Chinese or PRC-style form that looks the same as Radical #74. Screen captures from the latest Unicode Code Charts, whose glyphs agree with both versions of GB 18030, are shown below:

Continue reading…

PRI 259

As the IVD Registrar, I am very pleased to announce PRI 259 (Public Review Issue #259), which is the combined registration of the new Moji_Joho IVD Collection and sequences for that IVD collection. According to procedures set forth in UTS #37 (Unicode Technical Standard #37, Unicode Ideographic Variation Database), the 90-day public review, which commences today, allows interested parties to submit comments, suggestions, and errors to the registrant via Unicode’s reporting form.
Continue reading…

U+4E00 versus U+2F00

Not all PDF authoring applications are the same, in terms of the extent to which they preserve the text content of the original document. Of course, this is not necessarily the fault of the PDF authoring application, but rather it is due to a disconnect between the PDF authoring process and access to the text content of the original document.

The best example for demonstrating this is to create a document that includes the two kanji 一 (U+4E00) and ⼀ (U+2F00). The reason why these two characters represent a good example is because in mainstream Japanese fonts, mainly those that are based on the Adobe-Japan1-x ROS, both map to the same glyph, specifically CID+1200.

If you download and unpack the 4E00vs2F00.zip file, you will find two PDF files, an Adobe InDesign file, and an MS Word file. If you open the original documents and search for 一 (U+4E00), you will find only a single instance, which is the one that is marked by the Unicode scalar value. However, if you open the respective PDF files, you will notice a difference. The one that is based on the MS Word file now includes two instances of 一 (U+4E00), and ⼀ (U+2F00) is no longer included in its content. You can search a PDF file by Unicode scalar value by using the “\uXXXX” notation, such as \u4E00 for U+4E00 (一). (Note: Depending on the version of MS Word that is being used, the PDF file may instead include two instances of (U+2F00). I am using Microsoft Word for Mac 2011 Version 14.3.8.)

Adobe InDesign has a built-in PDF library that has direct access to the text content, and is thus able to inject it into the text layer of the PDF file that it produces. MS Word uses a different pathway for producing a PDF file, one that does not have access to the text content of the original document.

ISO/IEC 14496-28:2012 (CFR) Is Now Freely Available!

For those who have been interested in ISO/IEC 14496-28:2012 (Composite Font Representation), which standardizes an XML format for defining font objects (aka CFR objects) that can reference more than one font resource and thus break the 64K glyph barrier, I am pleased to let this blog’s readership know that it is now among ISO’s Freely Available Standards. I am particularly pleased about this news, mainly because some developers have indicated that purchasing the standard effectively served as a barrier to supporting it. Well, the barrier has been removed!

Note that this change makes a whole lot of sense, because two ISO standards that are closely tied to CFR, ISO/IEC 10646 (Universal Coded Character Set, aka Unicode) and IEO/IEC 14496-22 (Open Font Format), are already among these freely available standards.

Also note that there is no direct download URL for this or other freely available ISO standards, because one must first agree to the no-cost licensing terms by clicking a button.

National Standards vs ISO/IEC 10646 & Unicode

Some people naïvely think that ISO/IEC 10646 and Unicode, which are joined at the hip, make the development of national standards an obsolete practice. As my IRG41 contribution, IRG N1964 (Continued National Standards Development & Horizontal Extensions), makes clear, nothing is further from the truth, especially when it involves CJK Unified Ideographs.

The content of this paper had been brewing in my head since IRG38, and only recently has congealed into a concise one-page paper that should be daunting to no one. If you are interested in such issues, please read the paper and provide feedback.

A Glimpse At Unicode Version 7.0

While the finishing touches are being put on Unicode Version 6.3, which will include the 1,002 Standardized Variants that I already mentioned, everything appears to be on track for Unicode Version 7.0, which will be in sync with ISO/IEC 10646:2014 (4th Edition).

Extension E, which adds 5,762 new CJK Unified Ideographs, is on track to be included in Version 7.0. This will bring the total number of CJK Unified Ideographs to a staggering 80,379 characters. I spent part of this morning preparing an updated version of my CJK Unified/Compatibility Ideographs table that provides a glimpse at Unicode Version 7.0.

(Note that neither Unicode Version 7.0 nor ISO/IEC 10646:2014 have been released or published, meaning that implementers should keep this caveat in mind, hence the use of “glimpse” in the title of this article.)

Standardized Variants—Part 4

As I described in Part 1, Part 2, and Part 3 of this series, Standardized Variants offer a Normalization-proof representation for the 1,002 CJK Compatibility Ideographs, which are encoded in the BMP, and at the end of Plane 2. These 1,002 Standardized Variants have been approved, and will be included in Unicode Version 6.3. They will, of course, also be included in IS0/IEC 10646.

In an effort to provide to font developers advance support for the Standardized Variants that correspond to glyphs in Adobe’s public ROSes, the next version of AFDKO will include a new version of the Adobe-Japan1_sequences.txt file that appends entries that correspond to 89 of these Standardized Variants, along with Adobe-CNS1_sequences.txt and Adobe-Korea1_sequences.txt files that specify 14 and 270 entries, respectively, that correspond to these Standardized Variants. If you click on the file names, you can download the files and use them immediately. These are used with the AFDKO makeotf tool, and specified as the argument of the “-ci” command-line option.

Baby steps…

UTR #50 Released!

The Unicode Consortium announced the release of UTR #50, Unicode Vertical Text Layout, today, via Twitter and their blog. Although I was involved in this Unicode Technical Report to some extent, any congratulatory comments should be directed toward its original and current editors, Eric Muller and Koji ISHII (石井宏治), respectively.

A Tale of Three (OpenType) Features

In an effort to make sure that the infrastructure to support UTR #50 (Unicode Vertical Text Layout) will be in place—sooner rather than later—I spent a significant part of last week working with key people within Adobe, and at Microsoft and W3C, to put together a proposal for a new OpenType feature, to be tagged ‘vrtr’, for supporting this soon-to-be published standard. Below is full description that we came up with, and which was submitted for inclusion in the OpenType Specification and in OFF (ISO/IEC 14496-22 or Open Font Format):

Tag: ‘vrtr’

Friendly name: Vertical Alternates For Rotation

Registered by: Adobe/Microsoft/W3C

Function: Transforms default glyphs into glyphs that are appropriate for sideways presentation in vertical writing mode. While the glyphs for most characters in East Asian writing systems remain upright when set in vertical writing mode, glyphs for other characters—such as those of other scripts or for particular Western-style punctuation—are expected to be presented sideways in vertical writing.

Example: As a first example, the glyphs for FULLWIDTH LESS-THAN SIGN (U+FF1C; “<”) and FULLWIDTH GREATER-THAN SIGN (U+FF1E; “>”) in a font with a non-square em-box are transformed into glyphs whose aspect ratio differs from the default glyphs, which are properly sized for sideways presentation in vertical writing mode. As a second example, the glyph for LEFT SQUARE BRACKET (U+005B, “[“) in a brush-script font that exhibits slightly rising horizontal strokes may use an obtuse angle for its upper-left corner when in horizontal writing mode, but an alternate glyph with an acute angle for that corner is supplied for vertical writing mode.

Recommended implementation: The font includes versions of the glyphs covered by this feature that, when rotated 90 degrees clockwise by the layout engine for sideways presentation in vertical writing, differ in some visual way from rotated versions of the default glyphs, such as by shifting or shape. The vrtr feature maps the default glyphs to the corresponding to-be-rotated glyphs (GSUB lookup type 1).

Application interface: For GIDs found in the vrtr coverage table, the layout engine passes GIDs to the feature, then gets back new GIDs.

UI suggestion: This feature should be active by default for sideways runs in vertical writing mode.

Script/language sensitivity: Applies to any script when set in vertical writing mode.

Feature interaction: The vrtr and vert features are intended to be used in conjunction: vrtr for glyphs intended to be presented sideways in vertical writing, and vert for glyphs to be presented upright. Since they must never be activated simultaneously for a given glyph, there should be no interaction between the two features. These features are intended for layout engines that graphically rotate glyphs for sideways runs in vertical writing mode, such as those conforming to UTR#50. (Layout engines that instead depend on the font to supply pre-rotated glyphs for all sideways glyphs should use the vrt2 feature in lieu of vrtr and vert.) Because vrt2 supplies pre-rotated glyphs, the vrtr feature should never be used with vrt2, but may be used in addition to any other feature.

Continue reading…