DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(mysql.info) charset-unicode-sets

Info Catalog (mysql.info) charset-charsets (mysql.info) charset-charsets (mysql.info) charset-we-sets
 
 10.9.1 Unicode Character Sets
 -----------------------------
 
 MySQL has two Unicode character sets. You can store text in about 650
 languages using these character sets.
 
    * `ucs2' (UCS-2 Unicode) collations:
 
         * `ucs2_bin'
 
         * `ucs2_czech_ci'
 
         * `ucs2_danish_ci'
 
         * `ucs2_esperanto_ci'
 
         * `ucs2_estonian_ci'
 
         * `ucs2_general_ci' (default)
 
         * `ucs2_hungarian_ci'
 
         * `ucs2_icelandic_ci'
 
         * `ucs2_latvian_ci'
 
         * `ucs2_lithuanian_ci'
 
         * `ucs2_persian_ci'
 
         * `ucs2_polish_ci'
 
         * `ucs2_roman_ci'
 
         * `ucs2_romanian_ci'
 
         * `ucs2_slovak_ci'
 
         * `ucs2_slovenian_ci'
 
         * `ucs2_spanish2_ci'
 
         * `ucs2_spanish_ci'
 
         * `ucs2_swedish_ci'
 
         * `ucs2_turkish_ci'
 
         * `ucs2_unicode_ci'
 
    * `utf8' (UTF-8 Unicode) collations:
 
         * `utf8_bin'
 
         * `utf8_czech_ci'
 
         * `utf8_danish_ci'
 
         * `utf8_esperanto_ci'
 
         * `utf8_estonian_ci'
 
         * `utf8_general_ci' (default)
 
         * `utf8_hungarian_ci'
 
         * `utf8_icelandic_ci'
 
         * `utf8_latvian_ci'
 
         * `utf8_lithuanian_ci'
 
         * `utf8_persian_ci'
 
         * `utf8_polish_ci'
 
         * `utf8_roman_ci'
 
         * `utf8_romanian_ci'
 
         * `utf8_slovak_ci'
 
         * `utf8_slovenian_ci'
 
         * `utf8_spanish2_ci'
 
         * `utf8_spanish_ci'
 
         * `utf8_swedish_ci'
 
         * `utf8_turkish_ci'
 
         * `utf8_unicode_ci'
 
 The `ucs2_esperanto_ci' and `utf8_esperanto_ci' collations were added in
 MySQL 5.0.13. The `ucs2_hungarian_ci' and `utf8_hungarian_ci'
 collations were added in MySQL 5.0.19.
 
 MySQL implements the `utf8_unicode_ci' collation according to the
 Unicode Collation Algorithm (UCA) described at
 `http://www.unicode.org/reports/tr10/'. The collation uses the
 version-4.0.0 UCA weight keys:
 `http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt'.  The
 following discussion uses `utf8_unicode_ci', but it is also true for
 `ucs2_unicode_ci'.
 
 Currently, the `utf8_unicode_ci' collation has only partial support for
 the Unicode Collation Algorithm. Some characters are not supported yet.
 Also, combining marks are not fully supported. This affects primarily
 Vietnamese and some minority languages in Russia such as Udmurt, Tatar,
 Bashkir, and Mari.
 
 The most significant feature in `utf8_unicode_ci' is that it supports
 expansions; that is, when one character compares as equal to
 combinations of other characters. For example, in German and some other
 languages ‘`ss'’ is equal to ‘`ss'’.
 
 `utf8_general_ci' is a legacy collation that does not support
 expansions. It can make only one-to-one comparisons between characters.
 This means that comparisons for the `utf8_general_ci' collation are
 faster, but slightly less correct, than comparisons for
 `utf8_unicode_ci'.
 
 For example, the following equalities hold in both `utf8_general_ci' and
 `utf8_unicode_ci':
 
      A" = A
      O" = O
      U" = U
 
 A difference between the collations is that this is true for
 `utf8_general_ci':
 
      ss = s
 
 Whereas this is true for `utf8_unicode_ci':
 
      ss = ss
 
 MySQL implements language-specific collations for the `utf8' character
 set only if the ordering with `utf8_unicode_ci' does not work well for a
 language. For example, `utf8_unicode_ci' works fine for German and
 French, so there is no need to create special `utf8' collations for
 these two languages.
 
 `utf8_general_ci' also is satisfactory for both German and French,
 except that ‘`ss'’ is equal to ‘`s'’, and not to ‘`ss'’. If
 this is acceptable for your application, then you should use
 `utf8_general_ci' because it is faster.  Otherwise, use
 `utf8_unicode_ci' because it is more accurate.
 
 `utf8_swedish_ci', like other `utf8' language-specific collations, is
 derived from `utf8_unicode_ci' with additional language rules. For
 example, in Swedish, the following relationship holds, which is not
 something expected by a German or French speaker:
 
      U" = Y < O"
 
 The `utf8_spanish_ci' and `utf8_spanish2_ci' collations correspond to
 modern Spanish and traditional Spanish, respectively. In both
 collations, ‘`ñ'’ (n-tilde) is a separate letter between ‘`n'’
 and ‘`o'’. In addition, for traditional Spanish, ‘`ch'’ is a
 separate letter between ‘`c'’ and `d', and ‘`ll'’ is a separate
 letter between ‘`l'’ and ‘`m'’
 
Info Catalog (mysql.info) charset-charsets (mysql.info) charset-charsets (mysql.info) charset-we-sets
automatically generated byinfo2html