Sequences

Sequences are important in the context of Unicode, and UAX #34 (Unicode Named Character Sequences) is a good reference for Unicode sequences. The first type of sequence that typically comes to mind in the context of Japanese are Ideographic Variation Sequences (IVSes), which are registered and maintained by The Unicode Consortium via the Ideographic Variation Database (IVD). There are also Standardized Variation Sequences that are much more closely bound to the standard.

Many people are blissfully unaware that a small number of characters in the JIS X 0213 standard, which includes a small number of kana, require the use of sequences of two characters in order to represent them in Unicode. The number of such characters is 25, and they are shown in the table below. Depending on the font that is used by your web browser, the glyphs in the “Character” column may or may not appear correctly as a single glyph.

JIS X 0213 Code Point Unicode Sequence Character
1-04-87 <U+304B,U+309A> か゚
1-04-88 <U+304D,U+309A> き゚
1-04-89 <U+304F,U+309A> く゚
1-04-90 <U+3051,U+309A> け゚
1-04-91 <U+3053,U+309A> こ゚
1-05-87 <U+30AB,U+309A> カ゚
1-05-88 <U+30AD,U+309A> キ゚
1-05-89 <U+30AF,U+309A> ク゚
1-05-90 <U+30B1,U+309A> ケ゚
1-05-91 <U+30B3,U+309A> コ゚
1-05-92 <U+30BB,U+309A> セ゚
1-05-93 <U+30C4,U+309A> ツ゚
1-05-94 <U+30C8,U+309A> ト゚
1-06-88 <U+31F7,U+309A> ㇷ゚
1-11-36 <U+00E6,U+0300> æ̀
1-11-40 <U+0254,U+0300> ɔ̀
1-11-41 <U+0254,U+0301> ɔ́
1-11-42 <U+028C,U+0300> ʌ̀
1-11-43 <U+028C,U+0301> ʌ́
1-11-44 <U+0259,U+0300> ə̀
1-11-45 <U+0259,U+0301> ə́
1-11-46 <U+025A,U+0300> ɚ̀
1-11-47 <U+025A,U+0301> ɚ́
1-11-69 <U+02E9,U+02E5> ˩˥
1-11-70 <U+02E5,U+02E9> ˥˩

 
So, what is the point of discussing sequences? This comes full circle back to solutions to the issue that plagues the CJK Compatibility Ideographs, specifically that application of any of the Normalization forms will convert them into their Canonical Equivalents that are CJK Unified Ideographs, thus removing any distinctions that they are meant to preserve. Given that sequences must already be used to fully represent JIS X 0213 in Unicode, where is the harm in using sequences to represent CJK Compatibility Ideographs? This can be in the form of registered IVSes from one of the existing IVD collections, or Standardized Variation Sequences, such as those that are currently in Unicode’s Pipeline Table.

Comments are closed.