DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(mysql.info) charset-cp932

Info Catalog (mysql.info) charset-asian-sets (mysql.info) charset-asian-sets
 
 10.9.7.1 The `cp932' Character Set
 ..................................
 
 *Why is `cp932' needed?*
 
 In MySQL, the `sjis' character set corresponds to the `Shift_JIS'
 character set defined by IANA, which supports JIS X0201 and JIS X0208
 characters. (See `http://www.iana.org/assignments/character-sets'.)
 
 However, the meaning of `SHIFT JIS' as a descriptive term has become
 very vague and it often includes the extensions to `Shift_JIS' that are
 defined by various vendors.
 
 For example, `SHIFT JIS' used in Japanese Windows environments is a
 Microsoft extension of `Shift_JIS' and its exact name is `Microsoft
 Windows Codepage : 932' or `cp932'. In addition to the characters
 supported by `Shift_JIS', `cp932' supports extension characters such as
 NEC special characters, NEC selected -- IBM extended characters, and
 IBM extended characters.
 
 Many Japanese users have experienced problems using these extension
 characters. These problems stem from the following factors:
 
    * MySQL automatically converts character sets.
 
    * Character sets are converted via Unicode (`ucs2').
 
    * The `sjis' character set does not support the conversion of these
      extension characters.
 
    * There are several conversion rules from so-called `SHIFT JIS' to
      Unicode, and some characters are converted to Unicode differently
      depending on the conversion rule. MySQL supports only one of these
      rules (described later).
 
 The MySQL `cp932' character set is designed to solve these problems. It
 is available as of MySQL 5.0.3.
 
 Because MySQL supports character set conversion, it is important to
 separate IANA `Shift_JIS' and `cp932' into two different character sets
 because they provide different conversion rules.
 
 *How does `cp932' differ from `sjis'?*
 
 The `cp932' character set differs from `sjis' in the following ways:
 
    * `cp932' supports NEC special characters, NEC selected -- IBM
      extended characters, and IBM selected characters.
 
    * Some `cp932' characters have two different code points, both of
      which convert to the same Unicode code point. When converting from
      Unicode back to `cp932', one of the code points must be selected.
      For this `round trip conversion,' the rule recommended by
      Microsoft is used. (See
      `http://support.microsoft.com/kb/170559/EN-US/'.)
 
      The conversion rule works like this:
 
         * If the character is in both JIS X 0208 and NEC special
           characters, use the code point of JIS X 0208.
 
         * If the character is in both NEC special characters and IBM
           selected characters, use the code point of NEC special
           characters.
 
         * If the character is in both IBM selected characters and NEC
           selected -- IBM extended characters, use the code point of
           IBM extended characters.
 
      The table shown at
      `http://www.microsoft.com/globaldev/reference/dbcs/932.htm'
      provides information about the Unicode values of `cp932'
      characters. For `cp932' table entries with characters under which
      a four-digit number appears, the number represents the
      corresponding Unicode (`ucs2') encoding. For table entries with an
      underlined two-digit value appears, there is a range of `cp932'
      character values that begin with those two digits. Clicking such a
      table entry takes you to a page that displays the Unicode value
      for each of the `cp932' characters that begin with those digits.
 
      The following links are of special interest. They correspond to
      the encodings for the following sets of characters:
 
         * NEC special characters:
 
                `http://www.microsoft.com/globaldev/reference/dbcs/932/932_87.htm'
 
         * NEC selected -- IBM extended characters:
 
                `http://www.microsoft.com/globaldev/reference/dbcs/932/932_ED.htm'
                `http://www.microsoft.com/globaldev/reference/dbcs/932/932_EE.htm'
 
         * IBM selected characters:
 
                `http://www.microsoft.com/globaldev/reference/dbcs/932/932_FA.htm'
                `http://www.microsoft.com/globaldev/reference/dbcs/932/932_FB.htm'
                `http://www.microsoft.com/globaldev/reference/dbcs/932/932_FC.htm'
 
    * Starting from version 5.0.3, `cp932' supports conversion of
      user-defined characters in combination with `eucjpms', and solves
      the problems with `sjis'/`ujis' conversion. For details, please
      refer to `http://www.opengroup.or.jp/jvc/cde/sjis-euc-e.html'.
 
    * For some characters, conversion to and from `ucs2' is different for
      `sjis' and `cp932'. The following tables illustrate these
      differences.
 
      Conversion to `ucs2':
 
      *`sjis'/`cp932' Value* *`sjis' -> `ucs2'      *`cp932' -> `ucs2'
                             Conversion*            Conversion*
      5C                     005C                   005C
      7E                     007E                   007E
      815C                   2015                   2015
      815F                   005C                   FF3C
      8160                   301C                   FF5E
      8161                   2016                   2225
      817C                   2212                   FF0D
      8191                   00A2                   FFE0
      8192                   00A3                   FFE1
      81CA                   00AC                   FFE2
 
      Conversion from `ucs2':
 
      *`ucs2' value*         *`ucs2' -> `sjis'      *`ucs2' -> `cp932'
                             Conversion*            Conversion*
      005C                   815F                   5C
      007E                   7E                     7E
      00A2                   8191                   3F
      00A3                   8192                   3F
      00AC                   81CA                   3F
      2015                   815C                   815C
      2016                   8161                   3F
      2212                   817C                   3F
      2225                   3F                     8161
      301C                   8160                   3F
      FF0D                   3F                     817C
      FF3C                   3F                     815F
      FF5E                   3F                     8160
      FFE0                   3F                     8191
      FFE1                   3F                     8192
      FFE2                   3F                     81CA
 
Info Catalog (mysql.info) charset-asian-sets (mysql.info) charset-asian-sets
automatically generated byinfo2html