DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(mysql.info) mysql-cluster-faq

Info Catalog (mysql.info) mysql-cluster-roadmap (mysql.info) ndbcluster (mysql.info) mysql-cluster-glossary
 
 15.10 MySQL Cluster FAQ
 =======================
 
 This section answers questions that are often asked about MySQL Cluster.
 
    * _What does `NDB' mean?_
 
      This stands for `*N*etwork *D*ata*b*ase.'
 
    * _What's the difference in using Cluster vs. using replication?_
 
      In a replication setup, a master MySQL server updates one or more
      slaves. Transactions are committed sequentially, and a slow
      transaction can cause the slave to lag behind the master.  This
      means that if the master fails, it is possible that the slave
      might not have recorded the last few transactions. If a
      transaction-safe engine such as `InnoDB' is being used, a
      transaction will either be complete on the slave or not applied at
      all, but replication does not guarantee that all data on the
      master and the slave will be consistent at all times. In MySQL
      Cluster, all data nodes are kept in synchrony, and a transaction
      committed by any one data node is committed for all data nodes. In
      the event of a data node failure, all remaining data nodes remain
      in a consistent state.
 
      In short, whereas standard MySQL replication is asynchronous,
      MySQL Cluster is synchronous.
 
      We are planning to implement (asynchronous) replication for
      Cluster in MySQL 5.1. This will include the capability to
      replicate both between two clusters and between a MySQL cluster
      and a non-Cluster MySQL server.
 
    * _Do I need to do any special networking to run Cluster? (How do
      computers in a cluster communicate?)_
 
      MySQL Cluster is intended to be used in a high-bandwidth
      environment, with computers connecting via TCP/IP. Its performance
      depends directly upon the connection speed between the cluster's
      computers. The minimum connectivity requirements for Cluster
      include a typical 100-megabit Ethernet network or the equivalent.
      We recommend you use gigabit Ethernet whenever available.
 
      The faster SCI protocol is also supported, but requires special
      hardware. See  mysql-cluster-interconnects, for more
      information about SCI.
 
    * _How many computers do I need to run a cluster, and why?_
 
      A minimum of three computers is required to run a viable cluster.
      However, the minimum *recommended* number of computers in a MySQL
      Cluster is four: one each to run the management and SQL nodes, and
      two computers to serve as storage nodes. The purpose of the two
      data nodes is to provide redundancy; the management node must run
      on a separate machine to guarantee continued arbitration services
      in the event that one of the data nodes fails.
 
    * _What do the different computers do in a cluster?_
 
      A MySQL Cluster has both a physical and logical organization, with
      computers being the physical elements. The logical or functional
      elements of a cluster are referred to as nodes, and a computer
      housing a cluster node is sometimes referred to as a cluster host.
      Ideally, there will be one node per cluster host, although it is
      possible to run multiple nodes on a single host. There are three
      types of nodes, each corresponding to a specific role within the
      cluster. These are:
 
         * *Management node (MGM node)*: Provides management services
           for the cluster as a whole, including startup, shutdown,
           backups, and configuration data for the other nodes. The
           management node server is implemented as the application
           `ndb_mgmd'; the management client used to control MySQL
           Cluster via the MGM node is `ndb_mgm'.
 
         * *Data node*: Stores and replicates data. Data node
           functionality is handled by an instance of the NDB data node
           process `ndbd'.
 
         * *SQL node*: This is simply an instance of MySQL Server
           (`mysqld') that is built with support for the `NDB Cluster'
           storage engine and started with the `--ndb-cluster' option to
           enable the engine.
 
    * _With which operating systems can I use Cluster?_
 
      MySQL Cluster is officially supported on Linux, Mac OS X, and
      Solaris. We are working to add Cluster support for other
      platforms, including Windows, and our goal is eventually to offer
      MySQL Cluster on all platforms for which MySQL itself is supported.
 
      It may be possible to run Cluster processes on other operating
      systems. We have had reports from users who say that they have run
      Cluster successfully on FreeBSD. However, Cluster on any but the
      three platforms mentioned here should be considered alpha software
      (at best), cannot be guaranteed reliable in a production setting,
      and _is not supported by MySQL AB_.
 
    * _What are the hardware requirements for running MySQL Cluster?_
 
      Cluster should run on any platform for which NDB-enabled binaries
      are available. Naturally, faster CPUs and more memory will improve
      performance, and 64-bit CPUs will likely be more effective than
      32-bit processors. There must be sufficient memory on machines
      used for data nodes to hold each node's share of the database (see
      _How much RAM do I Need?_ for more information). Nodes can
      communicate via a standard TCP/IP network and hardware. For SCI
      support, special networking hardware is required.
 
    * _How much RAM do I need? Is it possible to use disk memory at all?_
 
      Currently, Cluster is in-memory only. This means that all table
      data (including indexes) is stored in RAM. Therefore, if your data
      takes up 1GB of space and you want to replicate it once in the
      cluster, you need 2GB of memory to do so. This in addition to the
      memory required by the operating system and any applications
      running on the cluster computers.
 
      You can use the following formula for obtaining a rough estimate
      of how much RAM is needed for each data node in the cluster:
 
           (SizeofDatabase × NumberOfReplicas × 1.1 ) / NumberOfDataNodes
 
      To calculate the memory requirements more exactly requires
      determining, for each table in the cluster database, the storage
      space required per row (see  storage-requirements, for
      details), and multiplying this by the number of rows. You must
      also remember to account for any column indexes as follows:
 
         * Each primary key or hash index created for an `NDBCluster'
           table requires 21-25 bytes per record. These indexes use
           `IndexMemory'.
 
         * Each ordered index requires 10 bytes storage per record,
           using `DataMemory'.
 
         * Creating a primary key or unique index also creates an
           ordered index, unless this index is created with `USING
           HASH'. In other words, if created without `USING HASH', a
           primary key or unique index on a Cluster table takes up 31-35
           bytes per record in MySQL 5.0.
 
           Note that creating MySQL Cluster tables with `USING HASH' for
           all primary keys and unique indexes will generally cause
           table updates to run more quickly. This is due to the fact
           that less memory is required (because no ordered indexes are
           created), and that less CPU must be utilized (because fewer
           indexes must be read and possibly updated).
 
      It is especially important to keep in mind that _every MySQL
      Cluster table must have a primary key_. The `NDB' storage engine
      creates a primary key automatically if none is defined, and this
      primary key is created without `USING HASH'.
 
      There is no easy way to determine exactly how much memory is being
      used for storage of Cluster indexes at any given time; however,
      warnings are written to the Cluster log when 80% of available
      `DataMemory' or `IndexMemory' is in use, and again when use
      reaches 85%, 90%, and so on.
 
      We often see questions from users who report that, when they are
      trying to populate a Cluster database, the loading process
      terminates prematurely and an error message like this one is
      observed:
 
           ERROR 1114: The table 'my_cluster_table' is full
 
      When this occurs, the cause is very likely to be that your setup
      does not provide sufficient RAM for all table data and all
      indexes, _including the primary key required by the `NDB' storage
      engine and automatically created in the event that the table
      definition does not include the definition of a primary key_.
 
      It is also worth noting that all data nodes should have the same
      amount of RAM, as no data node in a cluster can use more memory
      than the least amount available to any individual data node. In
      other words, if there are three computers hosting Cluster data
      nodes, with two of these having 3GB of RAM available to store
      Cluster data, and one having only 1GB RAM, then each data node can
      devote only 1GB to clustering.
 
    * _Because MySQL Cluster uses TCP/IP, does that mean I can run it
      over the Internet, with one or more nodes in a remote location?_
 
      It is very doubtful in any case that a cluster would perform
      reliably under such conditions, as MySQL Cluster was designed and
      implemented with the assumption that it would be run under
      conditions guaranteeing dedicated high-speed connectivity such as
      that found in a LAN setting using 100 Mbps or gigabit Ethernet
      (preferably the latter). We neither test nor warrant its
      performance using anything slower than this.
 
      Also, it is extremely important to keep in mind that
      communications between the nodes in a MySQL Cluster are not
      secure; they are neither encrypted nor safeguarded by any other
      protective mechanism. The most secure configuration for a cluster
      is in a private network behind a firewall, with no direct access
      to any Cluster data or management nodes from outside. (For SQL
      nodes, you should take the same precautions as you would with any
      other instance of the MySQL server.)
 
    * _Do I have to learn a new programming or query language to use
      Cluster?_
 
      No. Although some specialized commands are used to manage and
      configure the cluster itself, only standard (My)SQL queries and
      commands are required for the following operations:
 
         * Creating, altering, and dropping tables
 
         * Inserting, updating, and deleting table data
 
         * Creating, changing, and dropping primary and unique indexes
 
         * Configuring and managing SQL nodes (MySQL servers)
 
    * _How do I find out what an error or warning message means when
      using Cluster?_
 
      There are two ways in which this can be done:
 
         * From within the `mysql' client, use `SHOW ERRORS' or `SHOW
           WARNINGS' immediately upon being notified of the error or
           warning condition. Errors and warnings also be displayed in
           MySQL Query Browser.
 
         * From a system shell prompt, use `perror --ndb ERROR_CODE'.
 
    * _Is MySQL Cluster transaction-safe? What isolation levels are
      supported?_
 
      _Yes_: For tables created with the `NDB' storage engine,
      transactions are supported. In MySQL 5.0, Cluster supports only
      the `READ COMMITTED' transaction isolation level.
 
    * _What storage engines are supported by MySQL Cluster?_
 
      Clustering in MySQL is supported only by the `NDB' storage engine.
      That is, in order for a table to be shared between nodes in a
      cluster, it must be created using `ENGINE=NDB' (or
      `ENGINE=NDBCLUSTER', which is equivalent).
 
      (It is possible to create tables using other storage engines such
      as `MyISAM' or `InnoDB' on a MySQL server being used for
      clustering, but these non-`NDB' tables will *not* participate in
      the cluster.)
 
    * _Which versions of the MySQL software support Cluster? Do I have
      to compile from source?_
 
      Cluster is supported in all MySQL-max binaries in the 5.0 release
      series, except as noted in the following paragraph. You can
      determine whether your server has NDB support using either the
      `SHOW VARIABLES LIKE 'have_%'' or `SHOW ENGINES' statement. (See
       mysqld-max, for more information.)
 
      Linux users, please note that `NDB' is _not_ included in the
      standard MySQL server RPMs. Beginning with MySQL 5.0.4, there are
      separate RPM packages for the NDB storage engine and accompanying
      management and other tools; see the NDB RPM Downloads section of
      the MySQL 5.0 Downloads page for these. (Prior to 5.0.4, you had
      to use the `-max' binaries supplied as `.tar.gz' archives. This is
      still possible, but is not required, so you can use your Linux
      distribution's RPM manager if you prefer.) You can also obtain NDB
      support by compiling the `-max' binaries from source, but it is
      not necessary to do so simply to use MySQL Cluster. To download
      the latest binary, RPM, or source distribution in the MySQL 5.0
      series, visit `http://dev.mysql.com/downloads/mysql/5.0.html'.
 
    * _In the event of a catastrophic failure -- say, for instance, the
      whole city loses power *and* my UPS fails -- would I lose all my
      data?_
 
      All committed transactions are logged. Therefore, although it is
      possible that some data could be lost in the event of a
      catastrophe, this should be quite limited. Data loss can be
      further reduced by minimizing the number of operations per
      transaction. (It is not a good idea to perform large numbers of
      operations per transaction in any case.)
 
    * _Is it possible to use `FULLTEXT' indexes with Cluster?_
 
      `FULLTEXT' indexing is not currently supported by the `NDB'
      storage engine, or by any storage engine other than `MyISAM'. We
      are working to add this capability in a future release.
 
    * _Can I run multiple nodes on a single computer?_
 
      It is possible but not advisable. One of the chief reasons to run
      a cluster is to provide redundancy. To enjoy the full benefits of
      this redundancy, each node should reside on a separate machine. If
      you place multiple nodes on a single machine and that machine
      fails, you lose all of those nodes.  Given that MySQL Cluster can
      be run on commodity hardware loaded with a low-cost (or even
      no-cost) operating system, the expense of an extra machine or two
      is well worth it to safeguard mission-critical data. It also worth
      noting that the requirements for a cluster host running a
      management node are minimal. This task can be accomplished with a
      200 MHz Pentium CPU and sufficient RAM for the operating system
      plus a small amount of overhead for the `ndb_mgmd' and `ndb_mgm'
      processes.
 
    * _Can I add nodes to a cluster without restarting it?_
 
      Not at present. A simple restart is all that is required for
      adding new MGM or SQL nodes to a Cluster. When adding data nodes
      the process is more complex, and requires the following steps:
 
        1. Make a complete backup of all Cluster data.
 
        2. Completely shut down the cluster and all cluster node
           processes.
 
        3. Restart the cluster, using the -initial startup option.
 
        4. Restore all cluster data from the backup.
 
      In a future MySQL Cluster release series, we hope to implement a
      `hot' reconfiguration capability for MySQL Cluster to minimize (if
      not eliminate) the requirement for restarting the cluster when
      adding new nodes.
 
    * _Are there any limitations that I should be aware of when using
      Cluster?_
 
      `NDB' tables in MySQL are subject to the following limitations:
 
         * Not all character sets and collations are supported.
 
         * `FULLTEXT' indexes and index prefixes are not supported. Only
           complete columns may be indexed.
 
         * Spatial data types are not supported. See 
           spatial-extensions.
 
         * Only complete rollbacks for transactions are supported.
           Partial rollbacks and rollbacks to savepoints are not
           supported.
 
         * The maximum number of attributes allowed per table is 128,
           and attribute names cannot be any longer than 31 characters.
           For each table, the maximum combined length of the table and
           database names is 122 characters.
 
         * The maximum size for a table row is 8 kilobytes, not counting
           `BLOB's. There is no set limit for the number of rows per
           table. Table size limits depend on a number of factors, in
           particular on the amount of RAM available to each data node.
 
         * The `NDB' engine does not support foreign key constraints. As
           with `MyISAM' tables, these are ignored.
 
         * Query caching is not supported.
 
      For additional information on Cluster limitations, see 
      mysql-cluster-limitations.
 
    * _How do I import an existing MySQL database into a cluster?_
 
      You can import databases into MySQL Cluster much as you would with
      any other version of MySQL. Other than the limitation mentioned in
      the previous question, the only other special requirement is that
      any tables to be included in the cluster must use the `NDB'
      storage engine. This means that the tables must be created with
      `ENGINE=NDB' or `ENGINE=NDBCLUSTER'. It is also possible to
      convert existing tables using other storage engines to `NDB
      Cluster' using `ALTER TABLE', but requires an additional
      workaround. See  mysql-cluster-limitations, for details.
 
    * _How do cluster nodes communicate with one another?_
 
      Cluster nodes can communicate via any of three different
      protocols: TCP/IP, SHM (shared memory), and SCI (Scalable Coherent
      Interface). Where available, SHM is used by default between nodes
      residing on the same cluster host. SCI is a high-speed (1 gigabit
      per second and higher), high-availability protocol used in
      building scalable multi-processor systems; it requires special
      hardware and drivers. See  mysql-cluster-interconnects, for
      more about using SCI as a transport mechanism in MySQL Cluster.
 
    * _What is an `arbitrator'?_
 
      If one or more nodes in a cluster fail, it is possible that not
      all cluster nodes will be able to `see' one another. In fact, it
      is possible that two sets of nodes might become isolated from one
      another in a network partitioning, also known as a `split brain'
      scenario. This type of situation is undesirable because each set
      of nodes tries to behave as though it is the entire cluster.
 
      When cluster nodes go down, there are two possibilities. If more
      than 50% of the remaining nodes can communicate with each other,
      we have what is sometimes called a `majority rules' situation, and
      this set of nodes is considered to be the cluster. The arbitrator
      comes into play when there is an even number of nodes: in such
      cases, the set of nodes to which the arbitrator belongs is
      considered to be the cluster, and nodes not belonging to this set
      are shut down.
 
      The preceding information is somewhat simplified. A more complete
      explanation taking into account node groups follows:
 
      When all nodes in at least one node group are alive, network
      partitioning is not an issue, because no one portion of the
      cluster can form a functional cluster. The real problem arises
      when no single node group has all its nodes alive, in which case
      network partitioning (the `split-brain' scenario) becomes
      possible. Then an arbitrator is required.  All cluster nodes
      recognize the same node as the arbitrator, which is normally the
      management server; however, it is possible to configure any of the
      MySQL Servers in the cluster to act as the arbitrator instead. The
      arbitrator accepts the first set of cluster nodes to contact it,
      and tells the remaining set to shut down. Arbitrator selection is
      controlled by the `ArbitrationRank' configuration parameter for
      MySQL Server and management server nodes. (See 
      mysql-cluster-mgm-definition, for details.)  It should also be
      noted that the role of arbitrator does not in and of itself impose
      any heavy demands upon the host so designated, and thus the
      arbitrator host does not need to be particularly fast or to have
      extra memory especially for this purpose.
 
    * _What data types are supported by MySQL Cluster?_
 
      MySQL Cluster supports all of the usual MySQL data types, with the
      exception of those associated with MySQL's spatial extensions.
      (See  spatial-extensions.) In addition, there are some
      differences with regard to indexes when used with `NDB' tables.
      * MySQL Cluster tables (that is, tables created with
      `ENGINE=NDBCLUSTER') have only fixed-width rows. This means that
      (for example) each record containing a `VARCHAR(255)' column will
      require space for 255 characters (as required for the character
      set and collation being used for the table), regardless of the
      actual number of characters stored therein. This issue is expected
      to be fixed in a future MySQL release series.
 
      See  mysql-cluster-limitations, for more information about
      these issues.
 
    * _How do I start and stop MySQL Cluster?_
 
      It is necessary to start each node in the cluster separately, in
      the following order:
 
        1. Start the management node with the `ndb_mgmd' command.
 
        2. Start each data node with the `ndbd' command.
 
        3. Start each MySQL server (SQL node) using `mysqld_safe
           --user=mysql &'.
 
      Each of these commands must be run from a system shell on the
      machine housing the affected node. You can verify the cluster is
      running by starting the MGM management client `ndb_mgm' on the
      machine housing the MGM node.
 
    * _What happens to cluster data when the cluster is shut down?_
 
      The data held in memory by the cluster's data nodes is written to
      disk, and is reloaded in memory the next time that the cluster is
      started.
 
      To shut down the cluster, enter the following command in a shell
      on the machine hosting the MGM node:
 
           shell> ndb_mgm -e shutdown
 
      This causes the `ndb_mgm', `ndb_mgm', and any `ndbd' processes to
      terminate gracefully. MySQL servers running as Cluster SQL nodes
      can be stopped using `mysqladmin shutdown'.
 
      For more information, see  mgm-client-commands, and 
      multi-shutdown-restart.
 
    * _Is it helpful to have more than one management node for a
      cluster?_
 
      It can be helpful as a fail-safe. Only one MGM node controls the
      cluster at any given time, but it is possible to configure one MGM
      as primary, and one or more additional management nodes to take
      over in the event that the primary MGM node fails.
 
    * _Can I mix different kinds of hardware and operating systems in a
      Cluster?_
 
      Yes, so long as all machines and operating systems have the same
      endianness (all big-endian or all little-endian). It is also
      possible to use different MySQL Cluster releases on different
      nodes. However, we recommend this be done only as part of a
      rolling upgrade procedure.
 
    * _Can I run two data nodes on a single host? Two SQL nodes?_
 
      Yes, it is possible to do this. In the case of multiple data
      nodes, each node must use a different data directory. If you want
      to run multiple SQL nodes on one machine, each instance of
      `mysqld' must use a different TCP/IP port.
 
    * _Can I use hostnames with MySQL Cluster?_
 
      Yes, it is possible to use DNS and DHCP for cluster hosts.
      However, if your application requires `five nines' availability,
      we recommend using fixed IP addresses. Making communication
      between Cluster hosts dependent on services such as DNS and DHCP
      introduces additional points of failure, and the fewer of these,
      the better.
 
Info Catalog (mysql.info) mysql-cluster-roadmap (mysql.info) ndbcluster (mysql.info) mysql-cluster-glossary
automatically generated byinfo2html