DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(mysql.info) charset-unicode

Info Catalog (mysql.info) charset-operations (mysql.info) charset (mysql.info) charset-metadata
 
 10.7 Unicode Support
 ====================
 
 MySQL 5.0 supports two character sets for storing Unicode data:
 
    * `ucs2', the UCS-2 Unicode character set.
 
    * `utf8', the UTF-8 encoding of the Unicode character set.
 
 In UCS-2 (binary Unicode representation), every character is
 represented by a two-byte Unicode code with the most significant byte
 first. For example: `LATIN CAPITAL LETTER A' has the code `0x0041' and
 it is stored as a two-byte sequence: `0x00 0x41'. `CYRILLIC SMALL
 LETTER YERU' (Unicode `0x044B') is stored as a two-byte sequence: `0x04
 0x4B'. For Unicode characters and their codes, please refer to the
 Unicode Home Page (http://www.unicode.org/).
 
 Currently, UCS-2 cannot be used as a client character set, which means
 that `SET NAMES 'ucs2'' does not work.
 
 The UTF-8 character set (transform Unicode representation) is an
 alternative way to store Unicode data. It is implemented according to
 RFC 3629. The idea of the UTF-8 character set is that various Unicode
 characters are encoded using byte sequences of different lengths:
 
    * Basic Latin letters, digits, and punctuation signs use one byte.
 
    * Most European and Middle East script letters fit into a two-byte
      sequence: extended Latin letters (with tilde, macron, acute, grave
      and other accents), Cyrillic, Greek, Armenian, Hebrew, Arabic,
      Syriac, and others.
 
    * Korean, Chinese, and Japanese ideographs use three-byte sequences.
 
 RFC 3629 describes encoding sequences that take from one to four bytes.
 Currently, MySQL support for UTF-8 does not include four-byte
 sequences. (An older standard for UTF-8 encoding is given by RFC 2279,
 which describes UTF-8 sequences that take from one to six bytes. RFC
 3629 renders RFC 2279 obsolete; for this reason, sequences with five
 and six bytes are no longer used.)
 
 *Tip*: To save space with UTF-8, use `VARCHAR' instead of `CHAR'.
 Otherwise, MySQL must reserve three bytes for each character in a `CHAR
 CHARACTER SET utf8' column because that is the maximum possible length.
 For example, MySQL must reserve 30 bytes for a `CHAR(10) CHARACTER SET
 utf8' column.
 
Info Catalog (mysql.info) charset-operations (mysql.info) charset (mysql.info) charset-metadata
automatically generated byinfo2html