Never Say Never

In the realm of CJK Unified Ideographs, there is always talk about no more characters to encode, or that any new characters are simply unifiable variants. This is, in large part, merely wishful thinking.

In my experience, there are three important words to embrace: Never Say Never.

While perusing IWATA Corporations’s website, I came across the page about their extension to Kyodo News’ U-PRESS character set, which included a convenient PDF. I checked all of the characters, mainly to establish as many mappings to Adobe-Japan1-6 as possible, and found that 8 of the kanji were not in Unicode, and this effort involved checking the latest version of Extension E (aka IRG N1830), which is soon to become standardized. The image below highlights in yellow the 8 kanji that are not yet in Unicode:

What this demonstrates is simply that CJK Unified Ideographs are genuinely an open-ended script, and that there is always a possibility that new characters will be coined or discovered.

2 Responses to Never Say Never

  1. Simon Birtwistle says:

    The baseline on the penultimate line is disappointingly variable. kcal/kW˚ is particularly glaring.

    • While I cannot speak to IWATA Corporation’s decisions about baselines, I can state that specialized Latin ligatures, such as those on the penultimate line, are treated as symbols. When mixed with Japanese text, the relative baseline may be different than what is considered the Latin baseline. The distinction with these particular examples sort of confirms this, in that those that include an uppercase Latin character as a component have a lower relative baseline.