URO

The first set of ideographs to be encoded in Unicode (Version 1.1), which are referred to as CJK Unified Ideographs, are also referred to as the URO, which is an abbreviation for Unified Repertoire and Ordering. None of the other extensions are given this label. Extensions A through D have been standardized, and Extension E will soon be standardized. Only Extension A is in the BMP (Basic Multilingual Plane). Extension B and beyond are in Plane 2, which is called the SIP (Supplementary Ideographic Plane). What makes the URO special or unique?

The URO began with 20,902 characters (U+4E00 through U+9FA5). This repertoire is unique in that it combined or unified the ideographs from the most widely-used national standards at the time, meaning the early 1990s. In addition to establishing a repertoire, the ordering was determined through the use of specific regional dictionaries. In a nutshell, these are reasons for the URO name.

The URO is unique in that it is the only CJK Unified Ideographs block to which characters have been appended: 22 in Version 4.1 (U+9FA6 through U+9FBB), eight in Version 5.1 (U+9FBC through U+9FC3), another eight in Version 5.2 (U+9FC4 through U+9FCB), and one in Version 6.1 (U+9FCC). Thus, the URO currently includes 20,941 characters. Its block ends at U+9FFF, meaning that there are 51 available code points (U+9FCD through U+9FFF) that can accomodate smaller repertoires.

Because the URO is the only CJK Unified Ideograph block to which characters have been appended, the appended characters are easily overlooked, in terms of font implementations, or when searching for characters. The so-called dirty dozen—the twelve CJK Unified Ideographs in the BMP’s CJK Compatibility Ideographs block—are in the same situation, in that they are easily overlooked.

The February 22, 2012 article provided a PDF file that details the CJK Unified (and Compatibility) Ideographs that are in Unicode Version 6.1. If you missed that particular article, I encourage those with an interest to check it out.

CJK Type Blog

CJK Fonts, Character Sets & Encodings.

URO

By Dr. Ken Lunde

Comments (0)

Created