Source Han Unicode

One of my hobbies is apparently to explore various ways to stress-test Adobe products, and the target of today’s article happens to be recent adventures with Adobe InDesign and our Source Han families.

The background is that I produced Unicode-based glyph synopses as part of the Source Han Sans and Source Han Serif releases, but those PDFs show only up to 256 code points per page, and it takes several hundred pages to show their complete Unicode coverage. I also produced single-page PDFs that show all 65,535 glyphs. A Source Han Sans one is available here, and a Source Han Serif one is available here. However, they are not Unicode-based.

In the spirit of combining both of these attributes—Unicode and large 256×256 tables—I used Adobe InDesign to create a three-page, three-layer PDF that shows both Source Han families, and the extent to which they support Unicode.

First, I created a three-page document whereby each page measures 1000mm by 1000mm. In other words, one square meter. I did this so that I could specify reasonable point sizes. (I used 9-point.) Besides, such a document is unlikely to be printed, and if it were to be printed, one square meter is a reasonable size. The three pages are for the BMP (Basic Multilingual Plane, aka Plane 0), the SMP (Supplementary Multilingual Plane, aka Plane 1), and the SIP (Supplementary Ideographic Plane, aka Plane 2).

Second, I created three layers. One layer is the grid with code point markers, which is locked in the exported PDF. The other two layers are for Source Han Serif and Source Han Sans, and use transparency so that differences can be more clearly seen.

Third, I used the Simplified Chinese fonts for both families, mainly for greater glyph consistency in the URO and Extension A.

The Source Han Serif and Source Han Sans layers of the exported PDF can be toggled on and off to display each typeface or both. Because Source Han Serif includes glyphs for approximately 50 additional characters that are encoded in newer versions of Unicode, and because Source Han Sans supports Hong Kong SCS-2008 at the code point level, there are characters that include a glyph for one typeface but not for the other, especially in Plane 2.

While the exported PDF is a little over 30MB in size, and can be downloaded by clicking on the above image, the InDesign source file is almost 350MB in size. Yowza!

In closing, creating this document brought InDesign to its knees, so for those who wish to develop similar documents, keep in mind that some operations may take longer than expected.


One Response to Source Han Unicode

  1. For those who didn’t notice (😉), I needed to scale U+2E3A ⸺ TWO-EM DASH and U+2E3B ⸻ THREE-EM DASH along the X-axis in order for their glyphs to fit within the code point box, and needed to do the same for U+3031 〱 VERTICAL KANA REPEAT MARK and U+3032 〲 VERTICAL KANA REPEAT WITH VOICED SOUND MARK, but the scaling was along the Y-axis.