Unicode | CJK Type Blog

Posts in Category "Unicode"

Standards 101

Posted on September 13, 2016 by Dr. Ken Lunde | Comments (0)

Attention, students! Class is in session.

In my experience, the following two statements about standards are seemingly conflicting yet accurate:

Standards are incredibly useful—and required—for product development.
Standards cannot be completely trusted.

On one hand, developing products, such as typeface designs and their fonts, depends on standards.

On the other hand, standards themselves are developed by humans, meaning that they are prone to error, especially when they happen to be character set or glyph standards that include thousands or tens of thousands of representative glyphs.
Continue reading…

UTC #148, Extension F & Unicode Version 10.0

Posted on August 6, 2016 by Dr. Ken Lunde | Comments (0)

UTC #148 took place in Redmond, Washington last week, hosted by our friends at Microsoft. It was a four-day working meeting, and many important Unicode-related issues and proposals were discussed. A total of 7,888 new characters were formally accepted into the standard during this meeting. Among them were the 7,473 CJK Unified Ideographs of Extension F, along with the lone CJK Unified Ideograph U+9FEA that is appended to the URO (Unified Repertoire & Ordering) and is the result of the disunification of 㸂 U+3E02, which were accepted on 2016-08-04 for inclusion into Unicode Version 10.0. Version 10.0 is slated for a June 2017 release. This means that my table above is now less tentative (clicking on the image will reveal the entire PDF file that includes details about the unchanged CJK Compatibility Ideographs).

Other CJK Unified Ideographs that are slated to be included in Unicode Version 10.0 are the 20 characters, U+9FD6 through U+9FE9, which were accepted on 2014-10-28 (UTC #141).

This will bring the total number of CJK Unified Ideographs to 87,882, and as the table at the top of this article suggests, there is not much room left in Plane 2, and Extension G is just around the corner.

😱

For those who are curious about the 414 other new characters that were accepted during UTC #148, please click here, here, here, here, here, here, and here.

🐡

Unicode Version 9.0

Posted on July 17, 2016 by Dr. Ken Lunde | Comments (0)

For those who missed the memo, Unicode Version 9.0 was released on June 21, 2016, which added exactly 7,500 characters to the standard. Unicode now includes a total 128,172 characters, which is just shy of 3,000 characters under two full 256×256 planes.

While Version 9.0 does not add any new CJK Unified Ideographs, I used this opportunity to enhance my single-page CJK Unified/Compatibility Ideographs document to better track unassigned code points for the relevant blocks and planes. The image at the top of this article shows the first half of the document, and if you click on it, you’ll access the original PDF file that can be squirreled away for reference purposes.

I also used this opportunity to update my tentative Unicode Version 10.0 document in the same way.

As usual, enjoy!

🐡

Glyph Names versus CIDs

Posted on June 16, 2016 by Dr. Ken Lunde | Comments (0)

This will be a short, sweet, and to-the-point article. Sorry, no graphics nor photos.

When developing name-keyed fonts, glyph names matter. They matter a lot. When developing new fonts, the glyph names should either be explicitly listed in AGLFN (Adobe Glyph List For New Fonts) or derivable via the AGL Specification. Glyph names that adhere to AGLFN or the AGL Specification result in fonts with well-formed 'cmap' tables, which means that their glyphs will behave better in a broader range of environments. I cannot stress the importance of this.

CIDs (Character IDs), on the other hand, represent a completely different beast. If a font is genuinely CID-keyed, it means that there are absolutely no glyph names, regardless of whether the source font or fonts that were used to build the CID-keyed font were named-keyed. Once a font resource becomes CID-keyed, the original glyph names are literally jettisoned, and the only way in which to map Unicode values to glyphs is via the 'cmap' table, which is usually done using a UTF-32 CMap resource. In other words, when developing fonts that are intended to be deployed in a CID-keyed fashion, the source glyph names play absolutely no role in how such fonts are processed.

🐡

Tofu, Or Not Tofu

Posted on May 20, 2016 by Dr. Ken Lunde | Comments (0)

One of my more popular open source fonts is Adobe Blank, and to a less extent the related Adobe Blank 2 because it uses a 'cmap' table format, Format 13, that is not broadly supported. Actually, Adobe Blank provides absolutely nothing, because it maps all 1,111,998 Unicode code points to a range of 2,048 non-spacing and non-marking glyphs, yet such a font is useful for particular scenarios, such as addressing the FOUT (Flash Of Unstyled Text) problem.

Allow me to introduce Adobe NotDef, which is modeled after Adobe Blank in that it covers all of Unicode and maps to a range of 2,048 glyphs, but differs in that the functional glyphs are spacing and marking. The original suggestion for Adobe NotDef came from Dave Crossland. The glyphs match the shape and advance width of the standard Adobe .notdef glyph that is invoked in environments that do not support font fallback when the selected font does not include a glyph for a particular character, and as Dave wrote, Adobe NotDef is useful for font fallback purposes in that it can be used to prevent the display of non-standard .notdef glyphs that may be present in some fonts in the font fallback chain.
Continue reading…

25 Years of Unicode

Posted on May 12, 2016 by Dr. Ken Lunde | Comments (2)

The Unicode Consortium celebrated its 25th anniversary in January of this year. The photo above is the celebratory (U+1F955 CARROT; a Unicode Version 9.0 candidate) cake that was enjoyed during the UTC (Unicode Technical Committee) #146 meeting that was hosted by IBM in San José from January 25th through 28th, 2016.
Continue reading…

Soon To Open: Plane 3, the Tertiary Ideographic Plane

Posted on April 27, 2016 by Dr. Ken Lunde | Comments (0)

Guess what.

🤔

Plane 2, the SIP (Supplementary Ideographic Plane), is almost full.

Right off the bat, in Unicode Version 3.1 (March of 2001), Extension B filled it nearly two-thirds of the way with its 42,711 characters, along with 542 CJK Compatibility Ideographs. Extension C with 4,149 characters was added in Version 5.2 (October of 2009), Extension D with a mere 222 characters was added in Version 6.0 (October of 2010), and Extension E with 5,762 characters was added in Version 8.0 (June of 2015). On tap for Unicode Version 10, scheduled for a June of 2017 release, is Extension F that currently includes 7,473 characters (U+2CEB0 through U+2EBE0).
Continue reading…

Badges? We don’t need no stinkin’ badges!

Posted on April 16, 2016 by Dr. Ken Lunde | Comments (0)

Actually, we do.

As pointed out in Matthew Rechs‘ recent and excellent Typekit Blog article about Unicode’s Adopt a Character campaign, these badges were designed by the very talented Jake Giltsoff of the Typekit team at Adobe. Mine for U+1F421 🐡 BLOWFISH is shown above.
Continue reading…

Introducing “Width Test”

Posted on April 12, 2016 by Dr. Ken Lunde | Comments (0)

It seems that I am on roll, having released two new open source fonts on GitHub within the past week. The previous—and brief—article that was about the LOCL Test OpenType/CFF font simply pointed to the repository. This article will be longer. I promise.
Continue reading…

Introducing “LOCL Test”

Posted on April 8, 2016 by Dr. Ken Lunde | Comments (0)

Inspired by the font that I prepared for and referenced in the previous article, I decided to launch a dedicated open source project for this useful test font, LOCL Test.

Enjoy!

🐡

CJK Type Blog

CJK Fonts, Character Sets & Encodings.