DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(mysql.info) fulltext-search

Info Catalog (mysql.info) mysql-calendar (mysql.info) functions (mysql.info) cast-functions
 
 12.7 Full-Text Search Functions
 ===============================
 

Menu

 
* fulltext-boolean             Boolean Full-Text Searches
* fulltext-query-expansion     Full-Text Searches with Query Expansion
* fulltext-stopwords           Full-Text Stopwords
* fulltext-restrictions        Full-Text Restrictions
* fulltext-fine-tuning         Fine-Tuning MySQL Full-Text Search
 
      MATCH (COL1,COL2,...) AGAINST (EXPR [SEARCH_MODIFIER])
 
      SEARCH_MODIFIER: { IN BOOLEAN MODE | WITH QUERY EXPANSION }
 
 MySQL has support for full-text indexing and searching:
 
    * A full-text index in MySQL is an index of type `FULLTEXT'.
 
    * Full-text indexes can be used only with `MyISAM' tables, and can
      be created only for `CHAR', `VARCHAR', or `TEXT' columns.
 
    * A `FULLTEXT' index definition can be given in the `CREATE TABLE'
      statement when a table is created, or added later using `ALTER
      TABLE' or `CREATE INDEX'.
 
    * For large datasets, it is much faster to load your data into a
      table that has no `FULLTEXT' index and then create the index after
      that, than to load data into a table that has an existing
      `FULLTEXT' index.
 
 Full-text searching is performed using `MATCH() ...  AGAINST' syntax.
 `MATCH()' takes a comma-separated list that names the columns to be
 searched.  `AGAINST' takes a string to search for, and an optional
 modifier that indicates what type of search to perform.  The search
 string must be a literal string, not a variable or a column name. There
 are three types of full-text searches:
 
    * A boolean search interprets the search string using the rules of a
      special query language. The string contains the words to search
      for. It can also contain operators that specify requirements such
      that a word must be present or absent in matching rows, or that it
      should be weighted higher or lower than usual. Common words such
      as `some' or `then' are stopwords and do not match if present in
      the search string. The `IN BOOLEAN MODE' modifier specifies a
      boolean search. For more information, see  fulltext-boolean.
 
    * A natural language search interprets the search string as a phrase
      in natural human language (a phrase in free text).  There are no
      special operators. The stopword list applies. In addition, words
      that are present in more than 50% of the rows are considered
      common and do not match. Full-text searches are natural language
      searches if no modifier is given.
 
    * A query expansion search is a modification of a natural language
      search. The search string is used to perform a natural language
      search. Then words from the most relevant rows returned by the
      search are added to the search string and the search is done
      again. The query returns the rows from the second search. The
      `WITH QUERY EXPANSION' modifier specifies a query expansion
      search. For more information, see  fulltext-query-expansion.
 
 Constraints on full-text searching are listed in 
 fulltext-restrictions.
 
      mysql> CREATE TABLE articles (
          ->   id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
          ->   title VARCHAR(200),
          ->   body TEXT,
          ->   FULLTEXT (title,body)
          -> );
      Query OK, 0 rows affected (0.00 sec)
 
      mysql> INSERT INTO articles (title,body) VALUES
          -> ('MySQL Tutorial','DBMS stands for DataBase ...'),
          -> ('How To Use MySQL Well','After you went through a ...'),
          -> ('Optimizing MySQL','In this tutorial we will show ...'),
          -> ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
          -> ('MySQL vs. YourSQL','In the following database comparison ...'),
          -> ('MySQL Security','When configured properly, MySQL ...');
      Query OK, 6 rows affected (0.00 sec)
      Records: 6  Duplicates: 0  Warnings: 0
 
      mysql> SELECT * FROM articles
          -> WHERE MATCH (title,body) AGAINST ('database');
      +----+-------------------+------------------------------------------+
      | id | title             | body                                     |
      +----+-------------------+------------------------------------------+
      |  5 | MySQL vs. YourSQL | In the following database comparison ... |
      |  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |
      +----+-------------------+------------------------------------------+
      2 rows in set (0.00 sec)
 
 The `MATCH()' function performs a natural language search for a string
 against a text collection. A collection is a set of one or more columns
 included in a `FULLTEXT' index. The search string is given as the
 argument to `AGAINST()'. For each row in the table, `MATCH()' returns a
 relevance value; that is, a similarity measure between the search
 string and the text in that row in the columns named in the `MATCH()'
 list.
 
 By default, the search is performed in case-insensitive fashion.
 However, you can perform a case-sensitive full-text search by using a
 binary collation for the indexed columns. For example, a column that
 uses the `latin1' character set of can be assigned a collation of
 `latin1_bin' to make it case sensitive for full-text searches.
 
 When `MATCH()' is used in a `WHERE' clause, as in the example shown
 earlier, the rows returned are automatically sorted with the highest
 relevance first. Relevance values are non-negative floating-point
 numbers. Zero relevance means no similarity. Relevance is computed
 based on the number of words in the row, the number of unique words in
 that row, the total number of words in the collection, and the number
 of documents (rows) that contain a particular word.
 
 For natural-language full-text searches, it is a requirement that the
 columns named in the `MATCH()' function be the same columns included in
 some `FULLTEXT' index in your table. For the preceding query, note that
 the columns named in the `MATCH()' function (`title' and `body') are the
 same as those named in the definition of the `article' table's
 `FULLTEXT' index. If you wanted to search the `title' or `body'
 separately, you would need to create separate `FULLTEXT' indexes for
 each column.
 
 It is also possible to perform a boolean search or a search with query
 expansion. These search types are described in 
 fulltext-boolean, and  fulltext-query-expansion.
 
 The preceding example is a basic illustration that shows how to use the
 `MATCH()' function where rows are returned in order of decreasing
 relevance. The next example shows how to retrieve the relevance values
 explicitly. Returned rows are not ordered because the `SELECT' statement
 includes neither `WHERE' nor `ORDER BY' clauses:
 
      mysql> SELECT id, MATCH (title,body) AGAINST ('Tutorial')
          -> FROM articles;
      +----+-----------------------------------------+
      | id | MATCH (title,body) AGAINST ('Tutorial') |
      +----+-----------------------------------------+
      |  1 |                        0.65545833110809 |
      |  2 |                                       0 |
      |  3 |                        0.66266459226608 |
      |  4 |                                       0 |
      |  5 |                                       0 |
      |  6 |                                       0 |
      +----+-----------------------------------------+
      6 rows in set (0.00 sec)
 
 The following example is more complex. The query returns the relevance
 values and it also sorts the rows in order of decreasing relevance. To
 achieve this result, you should specify `MATCH()' twice: once in the
 `SELECT' list and once in the `WHERE' clause. This causes no additional
 overhead, because the MySQL optimizer notices that the two `MATCH()'
 calls are identical and invokes the full-text search code only once.
 
      mysql> SELECT id, body, MATCH (title,body) AGAINST
          -> ('Security implications of running MySQL as root') AS score
          -> FROM articles WHERE MATCH (title,body) AGAINST
          -> ('Security implications of running MySQL as root');
      +----+-------------------------------------+-----------------+
      | id | body                                | score           |
      +----+-------------------------------------+-----------------+
      |  4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
      |  6 | When configured properly, MySQL ... | 1.3114095926285 |
      +----+-------------------------------------+-----------------+
      2 rows in set (0.00 sec)
 
 The MySQL `FULLTEXT' implementation regards any sequence of true word
 characters (letters, digits, and underscores) as a word. That sequence
 may also contain apostrophes (`''), but not more than one in a row.
 This means that `aaa'bbb' is regarded as one word, but `aaa''bbb' is
 regarded as two words. Apostrophes at the beginning or the end of a
 word are stripped by the `FULLTEXT' parser; `'aaa'bbb'' would be parsed
 as `aaa'bbb'.
 
 The `FULLTEXT' parser determines where words start and end by looking
 for certain delimiter characters; for example, ‘` '’ (space),
 ‘`,'’ (comma), and ‘`.'’ (period). If words are not separated
 by delimiters (as in, for example, Chinese), the `FULLTEXT' parser
 cannot determine where a word begins or ends. To be able to add words
 or other indexed terms in such languages to a `FULLTEXT' index, you must
 preprocess them so that they are separated by some arbitrary delimiter
 such as ‘`"'’.
 
 Some words are ignored in full-text searches:
 
    * Any word that is too short is ignored. The default minimum length
      of words that are found by full-text searches is four characters.
 
    * Words in the stopword list are ignored. A stopword is a word such
      as `the' or `some' that is so common that it is considered to have
      zero semantic value.  There is a built-in stopword list, but it
      can be overwritten by a user-defined list.
 
 The default stopword list is given in  fulltext-stopwords. The
 default minimum word length and stopword list can be changed as
 described in  fulltext-fine-tuning.
 
 Every correct word in the collection and in the query is weighted
 according to its significance in the collection or query.
 Consequently, a word that is present in many documents has a lower
 weight (and may even have a zero weight), because it has lower semantic
 value in this particular collection. Conversely, if the word is rare,
 it receives a higher weight. The weights of the words are combined to
 compute the relevance of the row.
 
 Such a technique works best with large collections (in fact, it was
 carefully tuned this way). For very small tables, word distribution
 does not adequately reflect their semantic value, and this model may
 sometimes produce bizarre results. For example, although the word
 `MySQL' is present in every row of the `articles' table shown earlier,
 a search for the word produces no results:
 
      mysql> SELECT * FROM articles
          -> WHERE MATCH (title,body) AGAINST ('MySQL');
      Empty set (0.00 sec)
 
 The search result is empty because the word `MySQL' is present in at
 least 50% of the rows. As such, it is effectively treated as a
 stopword. For large datasets, this is the most desirable behavior: A
 natural language query should not return every second row from a 1GB
 table. For small datasets, it may be less desirable.
 
 A word that matches half of the rows in a table is less likely to
 locate relevant documents. In fact, it most likely finds plenty of
 irrelevant documents. We all know this happens far too often when we
 are trying to find something on the Internet with a search engine. It
 is with this reasoning that rows containing the word are assigned a low
 semantic value for _the particular dataset in which they occur_. A
 given word may exceed the 50% threshold in one dataset but not another.
 
 The 50% threshold has a significant implication when you first try
 full-text searching to see how it works: If you create a table and
 insert only one or two rows of text into it, every word in the text
 occurs in at least 50% of the rows. As a result, no search returns any
 results. Be sure to insert at least three rows, and preferably many
 more. Users who need to bypass the 50% limitation can use the boolean
 search mode; see  fulltext-boolean.
 
Info Catalog (mysql.info) mysql-calendar (mysql.info) functions (mysql.info) cast-functions
automatically generated byinfo2html