Dr. Ken Lunde – CJK Type Blog http://ccjktype.fonts.adobe.com/ CJK Fonts, Character Sets & Encodings. Tue, 20 Aug 2019 20:43:33 +0000 en-US hourly 1 https://wordpress.org/?v=5.4.4 オントロ & グスーム? https://ccjktype.fonts.adobe.com/2019/08/angstrom-square-ligatures.html Tue, 20 Aug 2019 20:43:33 +0000 http://blogs.adobe.com/CCJKType/?p=8466 Continue reading ]]>

What in the world could オントロ (ontoro) and グスーム (gusūmu) possibly mean? (If you wait a few seconds, a hint will flash in the animated GIF above.)

Besides U+332C ㌬ SQUARE PAATU that was covered over three years ago, there are a number of interesting katakana square ligatures, both encoded in Unicode, and included in Adobe-Japan1-7 that are accessible via the 'dlig' (Discretionary Ligatures) GSUB feature.

The four-katakana square ligatures for オントロ and グスーム are indeed interesting. Although these katakana square ligatures are not in Unicode, they were included in Adobe-Japan1-4 as CIDs 11885 and 11897, respectively, with their vertical forms at CIDs 11969 and 11981. Adobe-Japan1-4 also included the horizontal and vertical glyphs for the eight-katakana square ligature オングストローム (angstrom or Å) at CIDs 11883 and 11967, respectively, which manages to pack all eight component katakana into the em-box.

As far as I can tell, the four-katakana square ligatures for オントロ and グスーム originated in Morisawa’s proprietary glyph set that is called MORCODE, and their presence in MORCODE is the reason why they were included in Adobe-Japan1-4:

The upper character codes are for MORCODE (aka MORCODE I), and the lower ones are for MORCODE II. Note how they are positioned next to each and in sequence. The same is true for their vertical forms, though they are side-by-side, not stacked, because this is a glyph table.

Unlike other katakana square ligatures that can be combined, such as U+3330 ㌰ SQUARE PIKO (pico), U+3328 ㌨ SQUARE NANO (nano), U+3343 ㍃ SQUARE MAIKURO (micro), U+3349 ㍉ SQUARE MIRI (milli), U+3322 ㌢ SQUARE SENTI (centi), U+3325 ㌥ SQUARE DESI (deci), U+3314 ㌔ SQUARE KIRO (kilo), U+334B ㍋ SQUARE MEGA (mega), and U+3310 ㌐ SQUARE GIGA (giga) as submultiples and multiples of U+3318 ㌘ SQUARE GURAMU (gram), U+3339 ㌹ SQUARE HERUTU (hertz), U+334D ㍍ SQUARE MEETORU (meter), U+3351 ㍑ SQUARE RITTORU (liter), U+3357 ㍗ SQUARE WATTO (watt), and others, the four-katakana square ligatures for オントロ and グスーム must always be together in a specific sequence, otherwise they mean nothing. This includes their separate vertical forms in vertical layout. (Interestingly, katakana square ligatures for three of the multiples—デカ (deka), ヘクト (hecto), and テラ (tera)—are missing from Unicode, but are included in Adobe-Japan1-7 as CIDs 11908, 11927, and 11910, respectively.)

As an aside, some of the submultiples and multiples combine with base units to form a tighter square ligature, such as U+3315 ㌕ SQUARE KIROGURAMU (kilogram), U+3316 ㌖ SQUARE KIROMEETORU (kilometer), and U+3317 ㌗ SQUARE KIROWATTO (kilowatt).

One of the most widely abused katakana square ligatures is probably U+3320 ㌠ SQUARE SANTIIMU (centime). It is used to refer to a particular person’s team: サン (san) is a katakana rewrite of さん, which is the most common honorific that is appended to a name, and チーム (chīmu) means “team.” In other words, I can refer to my team as ランディ㌠ or 小林㌠.

In closing, those who can read Japanese may enjoy this article that was published by Asahi Shimbun in 2012.

🐡

]]>
2019 “State of the Unification” Report https://ccjktype.fonts.adobe.com/2019/07/2019-sotu.html Mon, 29 Jul 2019 02:00:25 +0000 http://blogs.adobe.com/CCJKType/?p=8440 Continue reading ]]>

The UTC #160 meeting took place last week at Microsoft’s HQ in Redmond, Washington.

For CJK enthusiasts, the big news is that the UTC accepted CJK Unified ideographs Extension G (aka IRG Working Set 2015), which includes 4,939 characters. Additional CJK Unified Ideographs were appended to the URO (13), Extension A (10), and Extension B (7). As a result, the Extension A block will be completely full, and the URO and Extension B blocks will be nearly full. The total number of CJK Unified Ideographs in Unicode Version 13.0 is expected to be 92,856, which will represent approximately 65% of its characters.

The Pipeline accurately reflects the additional CJK Unified Ideographs that are targeted for Unicode Version 13.0. I suspect that the image at the beginning of this article will be helpful, and when clicked, will provide a PDF version that also includes a table for CJK Compatibility Ideographs.

These additions reflect two issues about which developers need to be aware:

  • Plane 3 (aka TIP or Tertiary Ideographic Plane) is now open for business, and Extension G represents the first block encoded among its 65,534 available code points.
  • This is the first time that CJK Unified Ideographs have been appended to a block other than the URO.

🐡

]]>
横組みと縦組みのどちらにも対応する可変字幅のバリアブルフォントによる実現方法 https://ccjktype.fonts.adobe.com/2019/07/hv-compression-expansion-via-vfonts-ja.html Tue, 16 Jul 2019 14:07:36 +0000 http://blogs.adobe.com/CCJKType/?p=8426 Continue reading ]]>

English (英語) here

(翻訳:Adobe Type チーム 山本太郎)

グリフの可変字幅を可能にしながら、縦組みでのグリフの回転が必要となる欧文や和文組版における縦中横(縦組み行の中に横組みの要素が入る)の組み方も取り扱えるモデルを、最近考案しました。

本記事の目的は、私が開発したオープンソースのフォントと、その動作モデルの記述に関心を寄せていただくことにあります。そのフォントに対応するアプリケーションソフトウェアとレイアウトエンジンを実装する開発者に活用していただくことを意図したものです。

テストフォント

テストフォントは二つの軸をもつバリアブルフォントで、1,111,998 個の Unicode のコードポイントを、UAX #50Unicode Vertical Text Layout)のデータファイル「VerticalOrientation.txt」の Unicode 12.1 のバージョンに基づいて、次のグリフのどちらか一方に対応づけています。

  • Vertical_Orientation(「vo」)の属性値に「R」または「Tr」が割り当てられている 785,553 個のコードポイントは欧文グリフに対応づけます。
  • 「vo」属性値に「U」あるいは「Tu」が割り当てられている 326,445 個のコードポイントは CJK のグリフに対応づけます。

縦組みのレイアウトでは、姿勢が直立か、回転させたものかの違いが重要となるため、上記のコードポイントとグリフとの対応づけは妥当と考えられます。

ここで紹介するテストフォントには、欧文と CJK のグリフそれぞれ 256 個のインスタンスが含まれ、GID は 1 から 256 までと GID が 257 から 512 になります。その形とパラメータは下記のとおりです。

  • 欧文のグリフは中空の矩形で、デフォルトの字幅は 600 ユニットで、欧文のベースライン(y = 0)に乗っていて、可変字幅の範囲は 450 から 750 までで、デフォルトの字幅から 25% まで狭いか広い字幅が可能です。
  • CJK のグリフは全角の正方形で、デフォルトの字幅は 1000 ユニットで、欧文のベースラインより 12% 下方の位置にあります(Y = −120 から Y = 880 まで)。可変字幅の範囲は 750 から 1250 まで、デフォルトの字幅から 25% まで狭いか広い字幅が可能です。

それぞれのグリフに 256 のインスタンスを持たせている理由は、それによって「cmap」テーブルで指定される Unicode の対応づけを簡略化できるからです。このことは、対応関係が百万を超えるような場合に重要となります。

GID+513 は、縦中横で用いられる明示的に字幅が半角のグリフで、「hwid」(Half Widths—字幅半角)の GSUB フィーチャーを用いて欧文のグリフを置換するために用いられます。

このテストフォントには既に登録済みの「wdth」(Width—字幅)と未登録の「VWID」(Vertical Width—縦組み用字幅)のデザインのバリエーションの軸が含まれます。「縦組み用字幅」という訳語については、むしろ「垂直字送り量」という呼称が技術的にはより正確だとは認識していましたが、「字幅」の方が「wdth」軸との組み合わせという点では良いと思います。この両方の軸について、デフォルトの設定は 500 となっています。それは、横組みの欧文グリフの字送り量が 600 ユニットであることと、CJK グリフの 1000×1000 の正方形(字幅全角)のボディに対応したものです。設定値の最小値は 1 で、最大値は 1000 となっており、それぞれが 25% 狭い字幅と 25% 広い字幅とに対応します。

このテストフォントは GitHub 上のオープンソースの「Width & Vertical Width VF」プロジェクトにおいて、OpenType/CFF2 と TrueType の両方の形式で入手可能になっています。

横組みでの可変字幅のレイアウト

横組みのレイアウトを行う場合には、この記事の中で記述されているモデルは比較的単純です。「wdth」軸がグリフを望み通りに X 軸に沿って狭くしたり広げたりするために利用されます。「VWID」軸はそのデフォルトの設定に固定され、CJK のグリフ用の全角の字幅または 1000 ユニットに対応しています。言いかえれば、相対的なグリフの高さは横組みでは変化しないということです。

次に示す GIF アニメーションは、このテストフォントを用いて作成したもので、可変字幅の横組みのレイアウトを示したものです(ここで使った文字コード列は実は「かなABC漢字」に対応していますが、矩形グリフだけで表示されているので、それは明示されません)。

可変字幅が狭くなるときも、広くなるときも、グリフの高さが変化しないのは、「VWID」軸がデフォルトの設定に固定されているからです。アニメーションの時間設定については、デフォルトの設定が 5 秒間、二つの両極に 2 秒間、中間の設定に 1 秒間を割り当てています。

縦組みでの可変字幅のレイアウト

縦組みのレイアウトを行うときには、縦組みでも直立の姿勢で変化しない CJK のグリフについては、比較的単純明快です。垂直の Y 軸に沿って可変字幅を好みに合わせて狭くしたり広げたりするために「VWID」軸が利用され、「wdth」軸は CJK グリフの全角の字幅または 1000 ユニットのデフォルトの設定に固定されます。特別な取り扱いが必要な場合には次の二つの場合があります。

  • 縦組みで回転する必要のある欧文などのグリフの取り扱いは複雑です。「VWID」軸の設定が「wdth」軸に対して適用され、「VWID」軸はデフォルトの設定に固定されます。言いかえれば、横組みで配列したテクストを単純に回転したものと等しくなるということです。回転されたグリフは縦組みの中で狭くなったり広くなったりできますが、相対的な高さは変化しません。
  • 縦中横を取り扱うのは、特殊な場合であって、「VWID」軸の設定がそのまま利用され、「wdth」軸はデフォルトの設定に固定されます。言いかえれば、縦中横は CJK グリフと同じように動作し、グリフは縦組みのレイアウト中で低くなったり、高くなったりしますが、幅は固定で変化しません。このテストフォントには、「hwid」GSUB フィーチャーが含まれており、欧文グリフを用いて 2 文字の縦中横をシミュレートすることができます。

次に示す GIF アニメーションは、このテストフォントを用いて作成したもので、可変字幅の縦組みのレイアウトを示したもので、回転した文字列と縦中横を含んでいます。(先の例と同様、ここで使った文字コード列は「あAB漢字12国」に対応したものですが、矩形グリフだけで表示されているので、それは明示されません)。

回転した欧文グリフが狭くなるときも、広くなるときも、相対的な高さは変化しません。回転した欧文グリフの相対的な高さは、CJK グリフの字幅に固定されています。同様に、縦中横のグリフの高さは、低くなったり、高くなったりしますが、その字幅は[縦中横の対象が 2 文字に限られる場合には]CJK グリフの字幅に固定されます。アニメーションの時間設定については、上記の横組み用のものと同じです。

モデルの要約

次の表は、このモデルに関して、上述の二つのセクションで述べたレイアウト上の諸条件で、二つのデザインのバリエーションの軸の設定と設定範囲を要約したものです。

横組み 縦組み直立 縦組み回転形 縦組み縦中横
wdth 1〜1000 500 1〜1000 500
VWID 500 1〜1000 500 1〜1000

もちろん、ここで実際に使われている設定と設定範囲は、私が開発したテストフォントを基にしたもので、このモデルに従って作られた他のバリアブルフォントが少し異なる値を使用することはありえます。ここで肝要なことは、複数の軸の内の一つはデフォルトの設定に固定されることで、私が開発したフォントの場合には 500 になるということです。

UI において考慮すべき点

「wdth」と「VWID」のデザインのバリエーションの軸は、一つのバリアブルフォント中では別個の軸として実装され、それらの軸のうち一つは上述のように固定される必要がありますが、UI は「可変字幅」という名前をつけた、機能している一つの軸だけを表示するのが適切でしょう。横組みか縦組みかというレイアウトの方向、または文字が回転されるか縦中横の文脈で用いられるかに依存して、そこでの設定が適切な軸に適用されることが肝要です。もう一つ別のありうる方法は、デフォルトでは複数の軸のうちの一つを固定しますが、そのデフォルトの動作を上書きして、その固定を外すことができるようにするというものです。

どちらの場合でも、これら複数の軸がどのようにアプリケーションの UI 上で表示されるかは、アプリケーション自体がどれだけ洗練されたものであるかに大きく依存するかもしれません。あまり洗練されていないアプリケーションの場合には、機能的な軸を一つだけ表示するのが好ましいでしょうし、より洗練されたアプリケーションの場合には、上で述べたように、どちらの軸も表示してデフォルトの動作を上書きすることができるようにすることも可能でしょう。

デザインのバリエーションの軸の登録

もしこの記事で説明したモデルが一般に受け入れられるなら、「VWID」というデザインのバリエーション軸を登録することが必要になるでしょう。そうすれば、「vwid」や「vadv」などと同様に、すべて小文字で表記された名前を持つことになります。

最後に、この記事に書かれている事柄はすべて、現時点ではまだ、横組みにおいても縦組みにおいても可変字幅(狭い字幅・広い字幅あるいはその両方)に対応できる CJK バリアブルフォントを実装する場合の標準的な方法になればよいと私が考えているモデルの提案の段階にあります。

ぜひ、コメントをお持ちの場合には、お知らせください。

🐡

]]>
Adobe-Japan1-7 GSUB Features https://ccjktype.fonts.adobe.com/2019/07/aj17-gsub-features.html https://ccjktype.fonts.adobe.com/2019/07/aj17-gsub-features.html#comments Sat, 13 Jul 2019 13:13:56 +0000 http://blogs.adobe.com/CCJKType/?p=8405 Continue reading ]]>

For Adobe-Japan1–based OpenType/CFF Japanese fonts that support the glyph or glyphs for U+32FF ㋿ SQUARE ERA NAME REIWA (Unicode Version 12.1), meaning Adobe-Japan1-7 CID+23058 or CIDs 23058 and 23059, font developers need to be aware that adjustments to a small number of GSUB (Glyph SUBstitution) features are necessary to make them more easily accessible or simply usable.

Regardless of whether a font supports one or both glyphs, the 'dlig' (Discretionary Ligatures) GSUB feature benefits from the following additional substitution, which substitutes the glyphs for U+4EE4 令 and U+548C 和, CIDs 4009 and 4072, with the glyph for U+32FF ㋿, CID+23058:

substitute \4009 \4072 by \23058;

Click here to get the complete Adobe-Japan1-7 feature definition in AFDKO makeotf “features” file syntax.

The primary GSUB features for vertical layout, meaning the 'vert' (Vertical Alternates) and the now-deprecated—in favor of UAX #50 (Unicode Vertical Text Layout) compliance—'vrt2' (Vertical Alternates and Rotation) features, require the following additional substitution if the font supports the vertical form of U+32FF ㋿, CID+23059, which corresponds to fonts that support Adobe-Japan1-4 and higher:

substitute \23058 by \23059;

Click here to get the complete Adobe-Japan1-7 feature definitions in AFDKO makeotf “features” file syntax.

Lastly, and only for fonts that support the vertical form of U+32FF ㋿, the following additional 'aalt' (Access All Alternates) GSUB feature are helpful:

substitute \23058 by \23059;
substitute \23059 by \23058;

This particular GSUB feature is rather large, and varies depending on the actual glyphs in the font, so only the actual additional substitutions are provided above.

Happy font building!

🐡

]]>
https://ccjktype.fonts.adobe.com/2019/07/aj17-gsub-features.html/feed 2
Horizontal & Vertical Compression/Expansion via Variable Fonts https://ccjktype.fonts.adobe.com/2019/07/hv-compression-expansion-via-vfonts.html Mon, 01 Jul 2019 15:58:54 +0000 http://blogs.adobe.com/CCJKType/?p=8321 Continue reading ]]>

日本語 (Japanese) はこちら

I recently came up with a Variable Font model to handle glyph compression and expansion in horizontal and vertical layout that includes support for characters whose glyphs rotate in vertical layout, such as the glyphs for Western characters, along with TCY (縦中横 tatechūyoko in Japanese, which literally means “horizontal in vertical”) support.

The purpose of this article is to call attention to the open source test font that I developed, along with a description of the model itself, which are intended to be used by developers to implement such support in apps and layout engines.

The Test Font

The test font is a two-axis Variable Font that maps all 1,111,998 Unicode code points to one of two glyphs, based on the Unicode Version 12.1 version of the UAX #50 (Unicode Vertical Text Layout) data file, VerticalOrientation.txt:

  • 785,553 code points that are assigned the Vertical_Orientation (“vo”) property value of “R” or “Tr” map to Western glyphs
  • 326,445 code points that are assigned the “vo” property value of “U” or “Tu” map to CJK glyphs

Given that upright versus rotated orientation plays an important role in vertical layout, the above mappings seemed quite appropriate.

The test font includes 256 instances of the Western and CJK glyphs, from GIDs 1 through 256 and GIDs 257 through 512, respectively, whose shapes and parameters are described as follows:

  • The Western glyphs are hollow rectangles whose default width is 600 units, rest on the Western baseline (Y=0), and which compress or expand 25% (from 450 to 750 units)
  • The CJK glyphs are solid rectangles whose default width is 1000 units, rest 12% below the Western baseline (from Y=−120 to Y=880), and which compress or expand 25% (from 750 to 1250 units)

Why 256 instances of each glyph? It simplified the Unicode mappings that are specified in the 'cmap' table. This is important when dealing with over a million mappings.

GID+513 is an explicit half-width glyph that is intended to be used for TCY purposes, and is used to substitute the Western glyphs via the 'hwid' (Half Widths) GSUB feature.

The test font includes the registered 'wdth' (Width) and unregistered 'VWID' (Vertical Width; and yes, I am aware that “Vertical Advance” would be technically more correct, but the use of “Width” better pairs with the 'wdth' axis) design-variation axes. For both axes, the default setting is 500, which corresponds to a 600-unit horizontal advance for the Western glyphs, and a 1000×1000 box (aka full-width) for the CJK glyphs. The minimum and maximum axis settings are 1 and 1000, which corresponds to 25% compression and 25% expansion, respectively.

The test font is available in the open source Width & Vertical Width VF project on GitHub in both OpenType/CFF2 and TrueType formats.

Compression/Expansion in Horizontal Layout

The model that is being described in this article is relatively simple when performing horizontal layout: the 'wdth' axis is used to compress or expand glyphs along the X-axis as desired, and the 'VWID' axis is constrained to its default setting, which corresponds to full-width or 1000 units for the CJK glyphs. In other words, the relative height of the glyphs remains unchanged in horizontal layout.

The animated GIF below was created using the test font, and illustrates compression and expansion in horizontal layout (although it’s not obvious, the character string that was used was “かなABC漢字”):

Note how the glyph height remains unchanged regardless of the compression or expansion, which is due to constraining the 'VWID' axis to its default setting. In terms of animation timing, the default setting is five seconds, the two extremes are two seconds, and the intermediate settings are one second.

Compression/Expansion in Vertical Layout

When performing vertical layout, the handling of CJK glyphs that remain upright is relatively straight-forward: the 'VWID' axis is used to compress or expand glyphs along the Y-axis as desired, and the 'wdth' axis is constrained to its default setting, which corresponds to full-width or 1000 units for the CJK glyphs. The following are the two cases that require special handling:

  • The handling of glyphs that rotate in vertical layout, such as Western ones, makes this more complex: the 'VWID' axis setting is instead applied to the 'wdth' axis, and the 'VWID' axis is constrained to its default setting. In other words, it is equivalent to simply rotating text that was laid out horizontally. The rotated glyphs can become narrower or wider in vertical layout, but their relative height remains unchanged.
  • The handling of TCY is a special case whereby the 'VWID' axis setting is actually used, and the 'wdth' axis is constrained to its default setting. In other words, TCY behave like CJK glyphs: the glyphs become shorter or taller in vertical layout, but their width remains unchanged. The test font includes the 'hwid' GSUB feature so that two-character TCY can be simulated using the Western glyphs.

The animated GIF below was created using the test font, and illustrates compression and expansion in vertical layout, which includes both rotated and TCY strings (once again, it’s not obvious that the character string that was used was “あAB漢字12国”):

Note how the rotated Western glyphs become narrower or wider, but that their relative height remains unchanged. The relative height of the rotated Western glyphs is bound to the width of the CJK glyphs. Likewise, the height of the TCY glyphs become shorter or taller, but their widths are bound to the width of the CJK glyphs. The animation timing is identical to the horizontal one.

Model Summary

The following table summarizes the model, in terms of the settings and setting ranges for the two design-variation axes in the layout conditions that were described in the previous two sections of this article:

Axis Horizontal Vertical—Upright Vertical—Rotated Vertical—TCY
wdth 1–1000 500 1–1000 500
VWID 500 1–1000 500 1–1000

Of course, the actual settings and setting ranges are based on the test font that I developed, and other Variable Fonts that follow this particular model may use slightly different values. The main point is that one of the axes is constrained to its default setting, which is 500 in the test font that I developed.

UI Considerations

Although the 'wdth' and 'VWID' design-variation axes are implemented as separate axes in a Variable Font, and given that one of the axes needs to be constrained per the descriptions above, UIs should expose only a single functional axis, named “Width,” whose setting is applied to the appropriate axis, depending on the layout direction—horizontal or vertical—and whether the characters are rotated or used in a TCY context. Another alternative is to lock one of the axes by default, but to permit unlocking to override the default behavior.

In any case, how these axes are exposed to users in app UIs may largely depend on the sophistication of the apps themselves. Less-sophisticated apps would benefit by exposing only a single functional axis, and more-sophisticated ones may allow the default behavior to be overridden by exposing both as described at the end of the previous paragraph.

Design-Variation Axis Registration

If the model described in this article becomes generally accepted, the 'VWID' design-variation axis will need to be registered, which means that it would become an all-lowercase tag, such as 'vwid', 'vadv', or similar.

In closing, everything that is described in this article is a proposal for a model that I hope will become the standard way in which CJK Variable Fonts that support compression, expansion, or both, in horizontal and vertical layout, is implemented.

Of course, comments are welcome and encouraged.

🐡

]]>
Source Han Mono Version 1.001 Update https://ccjktype.fonts.adobe.com/2019/05/source-han-mono-v1001.html Thu, 30 May 2019 20:54:28 +0000 http://blogs.adobe.com/CCJKType/?p=8310 Continue reading ]]>

The new open source Source Han Mono (源ノ等幅 in Japanese, 본모노 in Korean, 思源等宽 in Simplified Chinese, 思源等寬 in Traditional Chinese—Taiwan, and 思源等寬 香港 in Traditional Chinese—Hong Kong SAR) typeface was released only four days ago, and this article provides details about its 70-font Super OTC (OpenType/CFF Collection). This article simply serves as an announcement for the Version 1.001 update that was released today. There are two main changes about which users should be aware:

  • The alignment zones and hinting parameters for the FDArray elements whose glyphs were derived from Source Code Pro were improved. Many thanks to Twitter user @KiYugadgeter for raising this issue here, and for confirming the fix here.
  • Our designer, Ryoko Nishizuka (西塚涼子), opted to improve the glyphs for the half-width katakana (半角片仮名) that were expanded to have 667-unit horizontal advances via anisotropic techniques. The image above shows glyphs from Source Han Sans, then from Source Han Mono Version 1.000, and then from Source Han Mono Version 1.001.

I also updated the 143-font Source Han Mega OTC and 216-font Ultra OTC in the Source Han & Noto CJK Mega/Ultra OTCs project earlier today.

Enjoy!

🐡

]]>
To UVS, Or Not To UVS https://ccjktype.fonts.adobe.com/2019/05/to-uvs-or-not-to-uvs.html https://ccjktype.fonts.adobe.com/2019/05/to-uvs-or-not-to-uvs.html#comments Tue, 28 May 2019 21:44:26 +0000 http://blogs.adobe.com/CCJKType/?p=8271 Continue reading ]]>

Several months ago I updated the Adobe-Japan1-UCS2 “ToUnicode” mapping file in the open source Mapping Resources for PDF project specifically to accommodate the two Adobe-Japan1-7 CIDs, CIDs 23058 and 23059.

However, that ToUnicode mapping file is long overdue for a rather extensive update for other reasons, and part of the delay was intentional on my part. The purpose of this article is to outline the reason for the delay, along with providing more concrete update plans.

The ToUnicode Mapping File

The purpose of the ToUnicode mapping file is to “derive content” from PDFs whose embedded fonts include glyphs that are referenced only by CIDs (if CID-keyed) or GIDs, and do not already include an embedded ToUnicode mapping table. The premise is that a CID—or GID—is meaningless without knowing from which code point it was originally mapped, or could have been mapped for the small number of ambiguous cases. Deriving content from PDFs allows text to be repurposed via Copy&Paste, so this is important.

A ToUnicode mapping file does exactly what its name suggests: it maps CIDs to Unicode code points, or to code point sequences. Unlike CMap resources that map Unicode code points to CIDs, or 'cmap' tables that map code points to GIDs that may also be CIDs, a ToUnicode mapping file specifies the inverse mapping. Some omissions and ambiguities can arise, either because a glyph is represented as a sequence, or it is mapped from multiple code points. An excellent example of the former is Adobe-Japan1-7 CID+16246 (ㇷ゚, which should not be confused with U+30D7 プ that corresponds to CID+979), which is not mapped from the Adobe-Japan1-7 Unicode CMap resources, because it is represented as the sequence <U+31F7, U+309A> (CIDs 16243 and 16327), and supported via the 'ccmp' (Glyph Composition/Decomposition) GSUB feature as the same sequence:

substitute \16243 \16327 by \16246;

The Adobe-Japan1-UCS2 ToUnicode mapping file maps this CID to the following sequence (3f76 is the hexadecimal form of decimal 16246):

<3f76> <31f7309a>

An excellent example of the latter is Adobe-Japan1-7 CID+1200, which is mapped from U+2F00 ⼀ KANGXI RADICAL ONE and U+4E00 一 (a CJK Unified Ideograph). If CID+1200 is included in a PDF, one would naturally expect U+4E00 一 to be copied, not U+2F00 ⼀ as its use is more obscure. The Adobe-Japan1-UCS2 ToUnicode mapping file makes this mapping preference explicit (04b0 is the zero-padded hexadecimal form of decimal 1200):

<04b0> <4e00>

The primary client of ToUnicode mapping files is, of course, Adobe Acrobat. Other PDF-consuming apps can also make use of these mapping files.

As the contents of the pdf2unicode directory suggest, there are ToUnicode mapping files for our public ROSes (an abbreviation for Registry, Ordering, and Supplement, which is a fancy way of referring to our public CJK glyph sets), meaning Adobe-CNS1-7, Adobe-GB1-5, Adobe-Japan1-7, Adobe-Korea1-2 (though deprecated), and Adobe-KR-9.

Some apps, such as Adobe InDesign, embed a ToUnicode mapping table when exporting PDFs, but other apps, such as Adobe Illustrator, do not. The ToUnicode mapping files become critical when opening PDFs that are exported from the latter app and others like it.

The recent Adobe-Japan1-UCS2 ToUnicode mapping file update involved only CIDs 23058 and 23059, both of which map to U+32FF ㋿ SQUARE ERA NAME REIWA. This was done due to their expected high-profile nature. However, I have intentionally held off on updating this ToUnicode mapping file to accommodate other changes, along with corrections, because I was waiting for UVS (Unicode Variation Sequence) support to become more widespread, in both fonts that specify them in a Format 14 (Unicode Variation Sequences) 'cmap' subtable, and in OSes and apps.

Mapping to UVSes

The first ToUnicode mapping file to map CIDs to UVSes (Unicode Variation Sequences)—an umbrella term for Standardized Variation Sequences (SVSes), Ideographic Variation Sequences (IVSes), and Emoji Variation Sequences (EVSes)—is the one for Adobe-KR-9, Adobe-KR-UCS2, and does so for CIDs 22462 through 22479, which represent the glyphs that are associated with 18 KRName IVSes that are treated as non-default UVSes:

<57be> <537fdb40dd09>
<57bf> <5795db40dd02>
<57c0> <57cedb40dd05>
<57c1> <5abadb40dd04>
<57c2> <6210db40dd06>
<57c3> <665fdb40dd06>
<57c4> <6674db40dd05>
<57c5> <695edb40dd04>
<57c6> <6d77db40dd05>
<57c7> <76dbdb40dd05>
<57c8> <8056db40dd06>
<57c9> <83bddb40dd08>
<57ca> <865cdb40dd05>
<57cb> <8941db40dd04>
<57cc> <8aa0db40dd05>
<57cd> <8acbdb40dd05>
<57ce> <927cdb40dd04>
<57cf> <9f9cdb40dd07>

So, how does that relate to updating the Adobe-Japan1-UCS2 ToUnicode mapping file? The following methodology that I plan to employ, at least for updating the mappings for the nearly 15K glyphs for kanji (aka ideographs), should explain:

  1. CIDs that unambiguously map from a single CJK Unified Ideograph code points shall map to that code point. The vast majority of CIDs that correspond to kanji are covered by this.
  2. CIDs that map from multiple CJK Unified Ideograph code points shall map to a preferred one.
  3. CIDs that map from CJK Compatibility Ideograph code points shall map to the corresponding SVS.
  4. All remaining CIDs should correspond to Adobe-Japan1 IVSes, and shall map to them.

A small number of CIDs for non-kanji will also map to SVSes, such as the proportional, italic, and full-width forms of the slashed zero, along with the pre-rotated forms of the former two. The CIDs for the former two, along with their pre-rotated forms, will map to the sequence <U+0030,U+FE00>, and the CID for the latter will map to <U+FF10,U+FE00>.

For those who are concerned with mapping CIDs to UVSes, don’t be. The VS (Variation Selector) is default ignorable, meaning that if the consuming app does not support the variation sequence, the BC (Base Character) is displayed as-is, which is considered the ideal fallback for variation sequences. The VS will still be present, in case the UVS is repurposed in an environment that does support it.

My new best friend, GitHub user t-tk, has been kindly suggesting, via Issue #6, some changes for the Adobe-Japan1-UCS ToUnicode mapping file, though some of them would result from performing the steps in the above list, or may be overridden by them. In any case, I genuinely value the feedback that he has been providing.

In closing, I hope to complete this project within the next month or so, but possibly sooner.

🐡

]]>
https://ccjktype.fonts.adobe.com/2019/05/to-uvs-or-not-to-uvs.html/feed 2
Source Han Mono Version 1.000 Technical Nuggets https://ccjktype.fonts.adobe.com/2019/05/source-han-mono-v1000.html Sun, 26 May 2019 23:56:23 +0000 http://blogs.adobe.com/CCJKType/?p=8182 Continue reading ]]>

As the readership of this blog should know, I updated the Source Han Sans and Noto Sans CJK fonts to Version 2.001 early last month, mainly to accommodate the glyphs for U+32FF ㋿ SQUARE ERA NAME REIWA, which is the two-ideograph square ligature form of Japan’s new era, Reiwa (令和), that began on 2019-05-01. I then seized the opportunity to update our corporate Adobe Clean Han typeface family, to bring it into alignment with Source Han Sans Version 2.001. The updated Adobe Clean Han fonts are now being served to this blog.

I then decided to embark on a somewhat ambitious project to develop a new open source typeface named Source Han Mono, which is best described as a Pan-CJK version of Source Han Code JP, first developed four years ago by my esteemed colleague in our Tōkyō office, Masataka Hattori (服部正貴). You can read the background here. This effectively closes Issue #2 in the Source Han Code JP project.

Source Han Mono is a derivative typeface design of Source Han Sans, designed by my colleague Ryoko Nishizuka (西塚涼子), and Source Code Pro, designed by my colleague Paul D. Hunt. Its localized names are 源ノ等幅 (Japanese), 본모노 (Korean), 思源等宽 (Simplified Chinese), 思源等寬 (Traditional Chinese—Taiwan), and 思源等寬 香港 (Traditional Chinese—Hong Kong SAR). (As an aside, the reason why the Traditional Chinese—Hong Kong SAR name, 思源等寬 香港, appears correctly is due to the updated Adobe Clean Han fonts. This benefitted the glyphs for U+7B49 and U+9999 .)

This article will detail some of the challenges that I faced, along with some of the decisions that I made, while developing this new Pan-CJK typeface family.

Source Han Mono is deployed only as a 70-font OpenType/CFF Collection (OTC) that supports seven weights—EL (ExtraLight), L (Light), N (Normal), Regular, M (Medium), Bold, and Heavy (H)—five languages—Japanese, Korean, Simplified Chinese, and two flavors of Traditional Chinese (Taiwan and Hong Kong SAR)—and two styles—upright and italic. This deployment format saves approximately 15MB, which is equivalent to a 10-font weight-specific OTC that supports the five languages and two styles. In other words, the size savings is approximately 15%, which is significant. This savings is due to the sharing of large 'sfnt' tables across weights, such as the 'cmap' and 'GSUB' tables.

The image below is Genesis 11:1, shown in three of the weights, in all of the supported languages, and in both styles:

Three Horizontal Advances

One of the main guiding principles of Source Han Mono is that all of the glyphs must have one of three possible horizontal advances with no exceptions: 0 (zero), 667, or 1000 units. For kana, ideographs, and full-width symbols, the Source Han Sans glyphs could be used as-is. For the Western glyphs that are proportional, it meant replacing the Source Sans Pro–derived glyphs with Source Code Pro–derived ones. Following in the footsteps of Source Han Code JP, I scaled the Source Code Pro glyphs to 111.2%, which resulted in expanding their horizontal advances from 600 to 667 units.

The glyphs for hangul (한글) letters and syllables, which have 920-unit horizontal advances in Source Han Sans, along with those for half-width katakana (半角片仮名), which have 500-unit horizontal advances, presented an interesting design challenge whose solution will be described later in this article.

Torpedoed Glyphs

Some of the glyphs in Source Han Sans have no purpose being included in Source Han Mono, because they are too wide, too tall, or redundant. The glyphs for U+2E3A ⸺ TWO-EM DASH and U+2E3B ⸻ THREE-EM DASH fall into the first two categories—remember that they include vertical forms that are two- and three-em tall, respectively. The glyphs for U+3031 VERTICAL KANA REPEAT MARK and U+3032 VERTICAL KANA REPEAT WITH VOICED SOUND MARK fall into the second category. The explicit half-width glyphs fall into the third category. Other glyphs were excluded because they have no corresponding glyph in Source Code Pro, specifically U+FB00 ff LATIN SMALL LIGATURE FF, U+FB03 ffi LATIN SMALL LIGATURE FFI, U+FB04 ffl LATIN SMALL LIGATURE FFL, U+1F16A 🅪 RAISED MC SIGN, U+1F16B 🅫 RAISED MD SIGN, and U+1F16C 🅬 RAISED MR SIGN.

In order to make room for several hundred italic glyphs, I decided to remove the glyphs for the 500 high-frequency archaic hangul syllables. These 500 syllables are still supported via combining jamo.

Two Styles, One Glyph Set

Unlike conventional fonts that use a separate glyph set for the Italic style, such as Source Code Pro, Source Han Mono includes the italic glyphs in a unified glyph set. For CJK or Pan-CJK fonts, this makes a lot of sense, because only a small number of glyphs need to be italic, and maintaining separate glyph sets would be comparable to the tail wagging the dog. The upright (non-italic) font instances are able to access the italic glyphs by either selecting the Italic style in apps that support style-linking, or via the 'ital' (Italics) GSUB feature.

In addition, and because all CIDs are precious when developing a Pan-CJK typeface, Western glyphs that do not vary in the Italic style, at least in Source Code Pro, such as for characters that are used for rudimentary math, are excluded from the glyph set. Those characters simply map to the non-italic glyphs in the Italic style. The number of affected Latin characters is 26.

Anisotropic Techniques

Two particular glyph classes presented an interesting challenge, because their glyphs didn’t match one of the three horizontal advances. The hangul letters and syllables, which have 920-unit horizontal advances in Source Han Sans, were one class. Half-width katakana, with 500-unit horizontal advances, were another class. For both glyph classes, I ended up leveraging anisotropic techniques to expand the glyphs for these two glyph classes to 1000 and 667 units, respectively. The animated image below compares text set using Source Han Sans and Source Han Mono, and the second and fourth lines include glyphs that resulted from applying anisotropic techniques:

For the half-width katakana glyphs, only the katakana in the range U+FF66 through U+FF9D were adjusted in this way. The glyphs for the half-width punctuation, parentheses, and annotations were adjusted only by expanding their horizontal advances to 667 units, and repositioning their glyphs along the X-axis as necessary. The glyph for U+FF65 ・ HALFWIDTH KATAKANA MIDDLE DOT in the above image illustrates this point well.

The technique involved first expanding the ExtraLight and Heavy masters to the desired horizontal advance by specifying a transformational matrix via the AFDKO (Adobe Font Development Kit for OpenType) rotatefont tool. The command lines below were used to expand the thousands of hangul letters and syllables from 920 to 1000 units (I needed to first convert the masters into UFOs):

rotatefont -ufo -matrix 1.087 0 0 1 0 0 Hangul_0_920.ufo Hangul_0_1000.ufo
rotatefont -ufo -matrix 1.087 0 0 1 0 0 Hangul_1_920.ufo Hangul_1_1000.ufo

The next step was to create a designspace file that specified two axes, weight ('wght') and width ('wdth') that are defined by the original (920-unit) and expanded (1000-unit) ExtraLight and Heavy masters. I also needed to define two instances that would become the new 1000-unit ExtraLight and Heavy masters, for which the xvalue of the weight axis is adjusted so that the outlines are interpolated at different rates (the yvalue was set to 0 and 1000 for the two instances, respectively), and for which the xvalue of the width axis is set to 1000 for both instances to maintain the desired 1000-unit width throughout the interpolation. I determined that xvalues of −25 (ExtraLight) and 900 (Heavy) gave the desired results for the weight axis. The command line below was used to produce the new ExtraLight and Heavy masters for the hangul letters and syllables:

makeinstancesufo -a -c --ufo-version 2 -d Hangul.designspace

My extraordinarily talented colleague, Miguel Sousa, taught this technique to me, which should be used only for relatively minor adjustments. If the degree to which the glyphs are expanded—or compressed—is too great, the design will begin to degrade.

Slashed Zero

In order to distinguish 0 (zero) from O (uppercase letter), Source Code Pro uses a center dot for the former glyphs. I wrote “glyphs” in the plural, because there are four glyphs for zero with a center dot: upright and italic, of course, but also standard and cap-height versions for both styles. The cap-height glyphs for zero, along with the nine other digits, are default (encoded), and language-tagging the text as English will invoke the standard-height digits. This behavior is identical to Source Han Sans and Source Han Serif.

Oh, right. Source Code Pro also includes slashed forms of the zero glyph, and Source Han Mono therefore includes all four versions, like for the center-dot form. The slashed form is accessible via the aptly-tagged 'zero' (Slashed Zero) GSUB feature.

More Extensive Style Linking

Like Source Han Sans and Source Han Serif, the Source Han Mono font instances are style-linked, but more extensively thanks to the presence of italic glyphs and the separate italic font instances whose 'cmap' tables map to them. The fonts for the former two families style-link only the Regular weight to the Bold weight. For Source Han Mono, all weights are additionally style-linked to the corresponding italic font instance. This is useful in apps that support style-linking via buttons or controls for the Bold and Italic styles.

Source Han & Noto CJK Mega/Ultra OTCs—Updated!

The last time that I updated the Mega OTCs and Ultra OTC was over a year ago, and the release of Source Han Mono gave me a good reason to update the entire project. The Source Han and Noto CJK Mega OTCs now have 143 (was 78) and 73 (was 64) fonts, respectively. The former Mega OTC grew to approximately 400MB (was approximately 300MB). The Ultra OTC that combines the two Mega OTCs now has 216 (was 142) fonts, but is a mere 200K larger than the Source Han Mega OTC due to massive 'sfnt' table sharing between the Source Han and Noto CJK families.

I would like to emphasize that the 216-font Ultra OTC is incredibly size-efficient. The total footprint of its 216 font instances, if they were installed as 216 separate OpenType/CFF fonts, would be nearly 4GB. I can therefore claim that the Ultra OTC deployment format literally decimates the footprint at approximately 400MB. That also means that each 65,535-glyph font instance represents less than 2MB.

Oh, and guess which OTC I currently have installed.

For More Information…

For more information about Source Han Mono, I encourage that you take the time to read the extensive ReadMe (a PDF will download if clicked) that I prepared. Heck, you may even notice that I used Source Han Mono to typeset it.

🐡

]]>
Adobe Clean Han Version 2.001 https://ccjktype.fonts.adobe.com/2019/05/adobe-clean-han-v2001.html Tue, 21 May 2019 18:08:07 +0000 http://blogs.adobe.com/CCJKType/?p=8162 Continue reading ]]>

The recent Source Han Sans Version 2.001 update provided to me an excellent opportunity to bring Adobe Clean Han, Adobe’s corporate Pan-CJK typeface, into alignment. I am pleased to announce that, as of yesterday, the updated Adobe Clean Han fonts are now being served to this blog via Adobe Fonts.

To celebrate this significant update, I decided that it would be appropriate to illustrate—using live text that can be easily copied and repurposed elsewhere—the 68 ideographs that include five separate glyphs, one for each of the five supported regions/languages:

Simplified Chinese
傑僭割劘匾叟喝塌姿嬴幰廋扇扉搨摩榻溲潛瀛瘦瞎磨窖竇箭篠簉糙綢纛羸翁翦翩肓臝艘花裯褐謁譖豁贏轄返迷途造週遍遭選遼鄰釁閼雕靠靡颼飯驎鬣魔麗麟
Traditional Chinese—Taiwan
傑僭割劘匾叟喝塌姿嬴幰廋扇扉搨摩榻溲潛瀛瘦瞎磨窖竇箭篠簉糙綢纛羸翁翦翩肓臝艘花裯褐謁譖豁贏轄返迷途造週遍遭選遼鄰釁閼雕靠靡颼飯驎鬣魔麗麟
Traditional Chinese—Hong Kong SAR
傑僭割劘匾叟喝塌姿嬴幰廋扇扉搨摩榻溲潛瀛瘦瞎磨窖竇箭篠簉糙綢纛羸翁翦翩肓臝艘花裯褐謁譖豁贏轄返迷途造週遍遭選遼鄰釁閼雕靠靡颼飯驎鬣魔麗麟
Japanese
傑僭割劘匾叟喝塌姿嬴幰廋扇扉搨摩榻溲潛瀛瘦瞎磨窖竇箭篠簉糙綢纛羸翁翦翩肓臝艘花裯褐謁譖豁贏轄返迷途造週遍遭選遼鄰釁閼雕靠靡颼飯驎鬣魔麗麟
Korean
傑僭割劘匾叟喝塌姿嬴幰廋扇扉搨摩榻溲潛瀛瘦瞎磨窖竇箭篠簉糙綢纛羸翁翦翩肓臝艘花裯褐謁譖豁贏轄返迷途造週遍遭選遼鄰釁閼雕靠靡颼飯驎鬣魔麗麟

It’s comforting to know that I can now language-tag text in CJK Type Blog articles using five different languages, whereas I was previously limited to only four.

🐡

]]>
Adobe Blank VF & Friends https://ccjktype.fonts.adobe.com/2019/05/adobe-blank-vf.html Tue, 07 May 2019 19:55:16 +0000 http://blogs.adobe.com/CCJKType/?p=8131 Continue reading ]]>

I spent the last couple of weeks developing a Variable Font version of the infamous Adobe Blank, and the open source project, named Adobe Blank VF & Friends, was released yesterday evening. But, before I detail what makes the Variable Font versions special, besides being Variable Fonts, let’s briefly go over the history of Adobe Blank and Adobe Blank 2.

Adobe Blank

First released in 2013 as open source, Adobe Blank simply maps all 1,111,998 Unicode code points to non-spacing and non-marking glyphs. What made the project interesting for me was to find the right balance between the number of glyphs and the size of the 'cmap' table. When mapping over a million code points, this becomes a valid concern. After some experimentation, I found that 2,049 glyphs was the sweet spot that resulted in 'CFF ' and 'cmap' tables of a relatively small size.

Adobe Blank 2

Adobe Blank 2, which was first released in 2015, is a two-glyph version of Adobe Blank that includes a Format 13 (Many-to-one range mappings) 'cmap' subtable that maps all 1,111,998 Unicode code points to GID+1. At the time, there was no convenient way to create a Format 13 subtable, so I used ttx, and supplied the actual hex values of the compiled subtable. The current version of ttx can successfully compile a Format 13 subtable by explicitly specifying all 1,111,998 mappings.

That then brings us to the Variable Font versions…

Adobe Blank VF & Adobe Black VF

Unlike Adobe Blank and Adobe Blank 2 that are CID-keyed and specify the special-purpose Adobe-Identity-0 ROS (Registry, Ordering, and Supplement), the Variable Font versions are not CID-keyed, because the name- versus CID-keyed distinction does not exist in the 'CFF2' table that is used for Variable Fonts.

Like Adobe Blank, Format 4 (Segment mapping to delta values) and Format 12 (Segmented coverage) 'cmap' subtables are included. The former subtable includes 63,454 mappings. The latter includes all 1,111,998 mappings.

I originally developed a marking version, named Adobe Black VF, in order to visually test its two design axes, 'wdth' (Width) and 'HGHT' (Height), and later figured that it serves as an excellent test font. The latter axis tag is all uppercase, because it is not yet registered. In keeping with the spirit of Adobe Blank in terms of being non-spacing, the default value of its two axes is zero (0). As the axis values increase, with 1000 being the maximum value, the horizontal or vertical advance also increases as appropriate.

What really makes these Variable Fonts special is the presence of the 'VVAR' (Vertical Metrics Variations) table, which is necessary to accommodate the variable metrics that are associated with the 'HGHT' axis. The AFDKO (Adobe Font Development Kit for OpenType) tools only recently started to support this table.

It is best to use the marking version, Adobe Black VF, to explore how the 'wdth' and 'HGHT' axes are expected to behave in horizontal and vertical writing modes:

Axis Horizontal Vertical
wdth The glyph and its horizontal advance expand along the X-axis to the right from the horizontal origin as the value increases The glyph expands along the X-axis from the center of the em-box to its left and right edges as the value increases
HGHT The glyph expands along the Y-axis from the center of the em-box to its top and bottom edges as the value increases The glyph and its vertical advance expand along the Y-axis downward from the vertical origin as the value increases

The five-frame animated image below shows how the two axes behave in horizontal writing mode, starting from axis values that are zero (0), alternately incrementing 'wdth' then 'HGHT' to 500 then to 1000 (surrounded by 1000×1000 cyan-colored boxes to better demonstrate the varying horizontal advances):

The five-frame animated image below illustrates the same, but in vertical writing mode, again starting from axis values that are zero (0), but alternately incrementing 'HGHT' then 'wdth' to 500 then to 1000:

Neat, eh?

Adobe Blank 2 VF & Adobe Black 2 VF

Like Adobe Blank 2, a Format 13 'cmap' subtable is used to map all 1,111,998 Unicode code points to GID+1. Adobe Blank 2 VF and Adobe Black 2 VF are otherwise identical to Adobe Blank VF and Adobe Black VF, and should be used only in environments that support the Format 13 'cmap' subtable.

Enjoy!

🐡

]]>