DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

cluster(5)




CLUSTER()                 SQL Commands                  CLUSTER()


NAME

     CLUSTER - cluster a table according to an index


SYNOPSIS

     CLUSTER indexname ON tablename
     CLUSTER tablename
     CLUSTER


DESCRIPTION

     CLUSTER instructs PostgreSQL to cluster the table  specified
     by  tablename based on the index specified by indexname. The
     index must already have been defined on tablename.

     When a table is clustered, it is physically reordered  based
     on  the  index  information. Clustering is a one-time opera-
     tion:  when the table is subsequently updated,  the  changes
     are  not clustered. That is, no attempt is made to store new
     or updated rows according  to  their  index  order.  If  one
     wishes,  one  can periodically recluster by issuing the com-
     mand again.

     When a table is clustered,  PostgreSQL  remembers  on  which
     index  it  was clustered. The form CLUSTER tablename reclus-
     ters the table on the  same  index  that  it  was  clustered
     before.

     CLUSTER without any parameter reclusters all the  tables  in
     the  current  database  that  the  calling user owns, or all
     tables if called by a superuser. (Never-clustered tables are
     not  included.)  This  form  of  CLUSTER  cannot be executed
     inside a transaction block.

     When a table is being clustered, an ACCESS EXCLUSIVE lock is
     acquired  on it. This prevents any other database operations
     (both reads and writes) from operating on  the  table  until
     the CLUSTER is finished.


PARAMETERS

     indexname
          The name of an index.

     tablename
          The name (possibly schema-qualified) of a table.


NOTES

     CLUSTER loses all visibility information  of  tuples,  which
     makes  the  table  look empty to any snapshot that was taken
     before the CLUSTER  command  finished.  That  makes  CLUSTER
     unsuitable  for  applications where transactions that access
     the table being clustered are run concurrently with CLUSTER.


SQL - Language StatementLast change: 2008-01-03 1


CLUSTER()                 SQL Commands                  CLUSTER()

     This is most visible with serializable transactions, because
     they take only one snapshot at the beginning of the transac-
     tion, but read-committed transactions are also affected.

     In cases where you are accessing single rows randomly within
     a  table, the actual order of the data in the table is unim-
     portant. However, if you tend to access some data more  than
     others, and there is an index that groups them together, you
     will benefit from using CLUSTER.  If you  are  requesting  a
     range  of  indexed  values from a table, or a single indexed
     value that has multiple rows that match, CLUSTER  will  help
     because  once  the  index  identifies the table page for the
     first row that matches, all other rows that match are  prob-
     ably  already  on  the same table page, and so you save disk
     accesses and speed up the query.

     During the cluster operation, a temporary copy of the  table
     is  created that contains the table data in the index order.
     Temporary copies of each index on the table are  created  as
     well.  Therefore, you need free space on disk at least equal
     to the sum of the table size and the index sizes.

     Because CLUSTER remembers the  clustering  information,  one
     can  cluster  the  tables  one  wants clustered manually the
     first time, and setup a timed event  similar  to  VACUUM  so
     that the tables are periodically reclustered.

     Because the planner records statistics about the ordering of
     tables,  it  is advisable to run ANALYZE [analyze(5)] on the
     newly clustered table.  Otherwise, the planner may make poor
     choices of query plans.

     There is another way to cluster data.  The  CLUSTER  command
     reorders  the  original table by scanning it using the index
     you specify. This can be slow on large  tables  because  the
     rows  are  fetched from the table in index order, and if the
     table is disordered, the entries are  on  random  pages,  so
     there is one disk page retrieved for every row moved. (Post-
     greSQL has a cache, but the majority of a big table will not
     fit  in  the cache.)  The other way to cluster a table is to
     use

     CREATE TABLE newtable AS
         SELECT * FROM table ORDER BY columnlist;

     which uses  the  PostgreSQL  sorting  code  to  produce  the
     desired  order;  this  is  usually much faster than an index
     scan for disordered data.  Then you drop the old table,  use
     ALTER  TABLE  ... RENAME to rename newtable to the old name,
     and recreate the table's indexes.  The big  disadvantage  of
     this  approach  is  that  it  does  not  preserve OIDs, con-
     straints, foreign key relationships, granted privileges, and


SQL - Language StatementLast change: 2008-01-03 2


CLUSTER()                 SQL Commands                  CLUSTER()

     other  ancillary  properties  of  the table - all such items
     must be manually recreated.  Another  disadvantage  is  that
     this  way requires a sort temporary file about the same size
     as the table itself, so peak disk usage is about three times
     the table size instead of twice the table size.


EXAMPLES

     Cluster the table  employees  on  the  basis  of  its  index
     emp_ind:

     CLUSTER emp_ind ON emp;

     Cluster the employees table using the same  index  that  was
     used before:

     CLUSTER emp;

     Cluster all tables in the database that have previously been
     clustered:

     CLUSTER;


COMPATIBILITY

     There is no CLUSTER statement in the SQL standard.


SEE ALSO

     clusterdb [clusterdb(1)]


SQL - Language StatementLast change: 2008-01-03 3



Man(1) output converted with man2html