CMap Resource Names Explained

For the longest time I have felt that the names used for many of our CMap resources deserve some amount of explanation. I see these names written in books from time to time, and it usually gives me a chuckle, mainly because I am the one responsible for coining many of them. This post is an opportunity for me to provide (some) definitive answers, along with some history. Of course, if this post raises more questions, please submit a comment, and I will make an honest effort to provide a timely answer.

In general, and with few exceptions, a CMap resource name is composed of a character set name, and encoding name, and a writing direction. For the most part, it is the character set names that deserve some explanation, because the encoding and writing direction names are fairly straight-forward. Also, whenever I mention a CMap resource name, it almost always has a corresponding vertical CMap resource.

Let’s begin where it all started, meaning OCF (Original Composite Format) names…

OCF Names That Became Adobe-Japan1-x Names

It makes a lot of sense to cover some of the names used in our OCF fonts, all of which carried forward to our CID-keyed fonts, specifically as the CMap resource names for the Adobe-Japan1-0 character collection. These OCF names provided some amount of inspiration for the names of new CMap resources.

The only encoding name that deserves an explanation is RKSJ, which is short for [single-byte] Roman, [half-width] Katakana, and Shift-JIS.

Perhaps the most controversial character set name is 83pv, which is used only for the 83pv-RKSJ-H CMap resource, and which corresponds to the Apple Macintosh Japanese character set up through KanjiTalk6. It is clear that the “83” portion refers to 1983, which in turn refers to JIS X 0208-1983 (or probably JIS C 6226-1983, which was its original designation). What may be unclear is what “pv” actually stands for. The two common theories at Adobe are that it refers to “plus verticals” or “proportional [plus] verticals.” I asked the person who coined most of the names used in OCF fonts, and he indicated that the former is the correct interpretation. 83pv-RKSJ-H is unique in that it has no corresponding vertical CMap resource because the vertical variants are encoded using separate code points, hence the use of “pv” in its name, and the single-byte ASCII range maps to proportional Latin glyphs.

Another interesting name is Ext, which corresponds to the NEC Japanese character set, and which is obviously short for “Extended.”

Yet another interesting name is Add, which corresponds to the Fujitsu FMR Japanese character set. My source indicated that it is short for “Additional.”

The NWP name is an abbreviation for “NEC Word Processor.”

Note that everything that follows is specific to CMap resources for CID-keyed fonts, meaning that the names were never used by OCF fonts.

Adobe-Japan1-x Names

The 83pv-RKSJ-H CMap resource was intended to support (up to) the KanjiTalk6 character set. When the KanjiTalk7 character set was established, a new CMap resource was necessary. Not being one who likes to break from tradition, and because the KanjiTalk7 character set was based on JIS90 (aka, JIS X 0208-1990), I chose to use 90pv-RKSJ-H as the name. I even created an accompanying vertical CMap resource, 90pv-RKSJ-V. An accompanying vertical CMap resource for 83pv-RKSJ-H was never made, but it was considered. Given that it was used for so long without an accompanying vertical CMap resource, it was deemed unnecessary.

When Adobe-Japan1-2 was defined, which added the glyphs for the kanji necessary to support the IBM Japanese character set, it was now possible to support the Microsoft Windows Japanese character set. If memory serves, this was based on Windows Version 3.1. This character set is based on JIS X 0208-1990, with Microsoft’s extensions, and encoded according to Shift-JIS, so 90ms-RKSJ-H is what I came up with. When we needed to make a version of this CMap resource that mapped the single-byte ASCII range to proportional Latin glyphs, I simply appended a “p” to the character set name: 90msp-RKSJ-H.

After JIS X 0213:2004 (aka, JIS2004) was established, and Microsoft, followed by Apple, decided to support its prototypical glyphs as the default in the fonts bundled in their respective OSes, I needed to come up with new CMap resource names. Although I am getting a bit ahead of myself by talking about Unicode, the only CMap resources that support JIS2004 are ones that support Unicode. I simply appended “2004” to the character set portion of the existing Unicode CMap resources, as follows:

Standard (JIS90)	JIS2004
UniJIS-UTF8-H	UniJIS2004-UTF8-H
UniJIS-UTF16-H	UniJIS2004-UTF16-H
UniJIS-UTF32-H	UniJIS2004-UTF32-H
UniJISX0213-UTF32-H	UniJISX02132004-UTF32-H

Adobe-GB1-x Names

The GB name, of course, refers to GB 2312-80, which is the most widely implement Simplified Chinese character set standard. It serves as the foundation for GB/T 12345-90, GBK, and GB 18030. The GBT name refers to GB/T 12345-90 (the “GB” and “T”), and not to the Traditional version of GB 2312-80.

The pc identifier is used after both GB versus GBT, and refers to “proportional characters,” and not “personal computer.” The reasoning here is that the default, at the time, single-byte Latin set was half-width, not proportional.

The GBK character set name, of course, refers to GBK. When GB 18030 was established, in 2000, which was an extension of GBK, I opted to use GBK2K (GBK 2000) as the character set name.

Adobe-CNS1-x Names

The most common encoding name for these CMap resources is B5, which is short for Big Five. Like with Adobe-GB1-x CMap resources, the pc identifier refers to “proportional characters,” and is used for Macintosh.

The HKm314 and HKm471 character set names refer to two different Hong Kong extensions to Big Five that were defined by Monotype Imaging. The numbers, 314 and 471, referred to the number of characters above and beyond Big Five. Likewise, the HKdla and HKdlb character set names refer to two different Hong Kong extensions to Big Five that were defined by DynaComware. The HKgccs name corresponds to Hong Kong GCCS (Government Chinese Character Set), which was an initial attempt at defining a national standard for Hong Kong. All of these character set names have been effectively superseded by the HKscs character set name, which corresponds to Hong Kong SCS (Supplementary Character Set), whose latest version is 2008 (previous versions were dated 1999, 2001, and 2004).

Adobe-Korea1-x Names

It is somewhat unfortunate that I chose to use KSC as a character set name, which corresponded to KS C 5601. I should have used KS. Keep in mind that this name was coined in 1995, and on August 20, 1997, KS C 5601 was redesignated as KS X 1001.

As described in the Adobe-GB1-x section, the pc identifier refers to “proportional characters.” It is used for the KSCpc-EUC-H CMap resource. Again, the reasoning here is that the default, at the time, single-byte Latin set was half-width, not proportional.

Interestingly, the use of ms in the KSCms-UHC-H CMap resource name is somewhat redundant, because the UHC (Unified Hangul Code) is specific to Microsoft Windows.

Unicode CMap Resources

I built the first Unicode CMap resources in the earlier half of 1996, mainly as an experiment. The first ones were for Japanese (Adobe-Japan1-x). Before I knew it, and because they were deemed to be very useful, they were bundled with a product, specifically Illustrator Version 7.0.

In any case, I chose to use “Uni” followed by another identifier, usually tied to an ROS (/Registry, /Ordering, and /Supplement, which correspond to the three elements of the /CIDSystemInfo dictionary in CIDFont and CMap resources), as the character set name. The table below details the names that I came up with:

ROS	Character Set Names (Unicode)
Adobe-Japan1-x	UniJIS, UniJIS2004, UniJISX0213 & UniJISX02132004
Adobe-Japan2-0	UniHojo
Adobe-GB1-x	UniGB
Adobe-CNS1-x	UniCNS
Adobe-Korea1-x	UniKS

Of course, most of these names make sense.

For Adobe-Japan2-0, which supported JIS X 0212-1990, I obviously couldn’t use “JIS,” because UniJIS was already being used by Adobe-Japan1-x. In retrospect, I could have used UniJIS0212, but I opted to use UniHojo instead. “Hojo” corresponds to 補助 (hojo) in Japanese, which means “supplemental,” and is part of the name of JIS X 0212-1990: 情報交換用漢字符号—補助漢字 (jōhō kōkan yō kanji fugō—hojo kanji).

For Adobe-Korea1-x, note that I used UniKS, and not UniKSC.

In summary, and in retrospect, I could have used different, better, or more descriptive names for CMap resources, but changing names after they were established would do more harm than good. Perhaps what is more important is the history of the names that were chosen for use in our CMap resources, and I hope that this post provides some answers.

CJK Type Blog

CJK Fonts, Character Sets & Encodings.