DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(mysql.info) nested-joins

Info Catalog (mysql.info) left-join-optimization (mysql.info) query-speed (mysql.info) outer-join-simplification
 
 7.2.10 Nested Join Optimization
 -------------------------------
 
 As of MySQL 5.0.1, the syntax for expressing joins allows nested joins.
 The following discussion refers to the join syntax described in 
 join.
 
 The syntax of TABLE_FACTOR is extended in comparison with the SQL
 Standard. The latter accepts only TABLE_REFERENCE, not a list of them
 inside a pair of parentheses. This is a conservative extension if we
 consider each comma in a list of TABLE_REFERENCE items as equivalent to
 an inner join. For example:
 
      SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
                       ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
 
 is equivalent to:
 
      SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
                       ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
 
 In MySQL, `CROSS JOIN' is a syntactic equivalent to `INNER JOIN' (they
 can replace each other). In standard SQL, they are not equivalent.
 `INNER JOIN' is used with an `ON' clause; `CROSS JOIN' is used
 otherwise.
 
 In versions of MySQL prior to 5.0.1, parentheses in TABLE_REFERENCES
 were just omitted and all join operations were grouped to the left. In
 general, parentheses can be ignored in join expressions containing only
 inner join operations.
 
 After removing parentheses and grouping operations to the left, the
 join expression:
 
      t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)
         ON t1.a=t2.a
 
 transforms into the expression:
 
      (t1 LEFT JOIN t2 ON t1.a=t2.a) LEFT JOIN t3
          ON t2.b=t3.b OR t2.b IS NULL
 
 Yet, the two expressions are not equivalent. To see this, suppose that
 the tables `t1', `t2', and `t3' have the following state:
 
    * Table `t1' contains rows `{1}', `{2}'
 
    * Table `t2' contains row `{1,101}'
 
    * Table `t3' contains row `{101}'
 
 In this case, the first expression returns a result set including the
 rows `{1,1,101,101}', `{2,NULL,NULL,NULL}', whereas the second
 expression returns the rows `{1,1,101,101}', `{2,NULL,NULL,101}':
 
      mysql> SELECT *
          -> FROM t1
          ->      LEFT JOIN
          ->      (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)
          ->      ON t1.a=t2.a;
      +------+------+------+------+
      | a    | a    | b    | b    |
      +------+------+------+------+
      |    1 |    1 |  101 |  101 |
      |    2 | NULL | NULL | NULL |
      +------+------+------+------+
 
      mysql> SELECT *
          -> FROM (t1 LEFT JOIN t2 ON t1.a=t2.a)
          ->      LEFT JOIN t3
          ->      ON t2.b=t3.b OR t2.b IS NULL;
      +------+------+------+------+
      | a    | a    | b    | b    |
      +------+------+------+------+
      |    1 |    1 |  101 |  101 |
      |    2 | NULL | NULL |  101 |
      +------+------+------+------+
 
 In the following example, an outer join operation is used together with
 an inner join operation:
 
      t1 LEFT JOIN (t2, t3) ON t1.a=t2.a
 
 That expression cannot be transformed into the following expression:
 
      t1 LEFT JOIN t2 ON t1.a=t2.a, t3.
 
 For the given table states, the two expressions return different sets
 of rows:
 
      mysql> SELECT *
          -> FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a;
      +------+------+------+------+
      | a    | a    | b    | b    |
      +------+------+------+------+
      |    1 |    1 |  101 |  101 |
      |    2 | NULL | NULL | NULL |
      +------+------+------+------+
 
      mysql> SELECT *
          -> FROM t1 LEFT JOIN t2 ON t1.a=t2.a, t3;
      +------+------+------+------+
      | a    | a    | b    | b    |
      +------+------+------+------+
      |    1 |    1 |  101 |  101 |
      |    2 | NULL | NULL |  101 |
      +------+------+------+------+
 
 Therefore, if we omit parentheses in a join expression with outer join
 operators, we might change the result set for the original expression.
 
 More exactly, we cannot ignore parentheses in the right operand of the
 left outer join operation and in the left operand of a right join
 operation. In other words, we cannot ignore parentheses for the inner
 table expressions of outer join operations. Parentheses for the other
 operand (operand for the outer table) can be ignored.
 
 The following expression:
 
      (t1,t2) LEFT JOIN t3 ON P(t2.b,t3.b)
 
 is equivalent to this expression:
 
      t1, t2 LEFT JOIN t3 ON P(t2.b,t3.b)
 
 for any tables `t1,t2,t3' and any condition `P' over attributes `t2.b'
 and `t3.b'.
 
 Whenever the order of execution of the join operations in a join
 expression (JOIN_TABLE) is not from left to right, we talk about nested
 joins. Consider the following queries:
 
      SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b) ON t1.a=t2.a
        WHERE t1.a > 1
 
      SELECT * FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a
        WHERE (t2.b=t3.b OR t2.b IS NULL) AND t1.a > 1
 
 Those queries are considered to contain these nested joins:
 
      t2 LEFT JOIN t3 ON t2.b=t3.b
      t2, t3
 
 The nested join is formed in the first query with a left join
 operation, whereas in the second query it is formed with an inner join
 operation.
 
 In the first query, the parentheses can be omitted: The grammatical
 structure of the join expression will dictate the same order of
 execution for join operations. For the second query, the parentheses
 cannot be omitted, although the join expression here can be interpreted
 unambiguously without them.  (In our extended syntax the parentheses in
 `(t2, t3)' of the second query are required, although theoretically the
 query could be parsed without them: We still would have unambiguous
 syntactical structure for the query because `LEFT JOIN' and `ON' would
 play the role of the left and right delimiters for the expression
 `(t2,t3)'.)
 
 The preceding examples demonstrate these points:
 
    * For join expressions involving only inner joins (and not outer
      joins), parentheses can be removed. You can remove parentheses and
      evaluate left to right (or, in fact, you can evaluate the tables
      in any order).
 
    * The same is not true, in general, for outer joins or for outer
      joins mixed with inner joins. Removal of parentheses may change
      the result.
 
 Queries with nested outer joins are executed in the same pipeline
 manner as queries with inner joins. More exactly, a variation of the
 nested-loop join algorithm is exploited. Recall by what algorithmic
 schema the nested-loop join executes a query. Suppose that we have a
 join query over 3 tables `T1,T2,T3' of the form:
 
      SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2)
                       INNER JOIN T3 ON P2(T2,T3)
        WHERE P(T1,T2,T3).
 
 Here, `P1(T1,T2)' and `P2(T3,T3)' are some join conditions (on
 expressions), whereas `P(t1,t2,t3)' is a condition over columns of
 tables `T1,T2,T3'.
 
 The nested-loop join algorithm would execute this query in the
 following manner:
 
      FOR each row t1 in T1 {
        FOR each row t2 in T2 such that P1(t1,t2) {
          FOR each row t3 in T3 such that P2(t2,t3) {
            IF P(t1,t2,t3) {
               t:=t1||t2||t3; OUTPUT t;
            }
          }
        }
      }
 
 The notation `t1||t2||t3' means `a row constructed by concatenating the
 columns of rows `t1', `t2', and `t3'.' In some of the following
 examples, `NULL' where a row name appears means that `NULL' is used for
 each column of that row. For example, `t1||t2||NULL' means `a row
 constructed by concatenating the columns of rows `t1' and `t2', and
 `NULL' for each column of `t3'.'
 
 Now let's consider a query with nested outer joins:
 
      SELECT * FROM T1 LEFT JOIN
                    (T2 LEFT JOIN T3 ON P2(T2,T3))
                    ON P1(T1,T2)
        WHERE P(T1,T2,T3).
 
 For this query, we modify the nested-loop pattern to get:
 
      FOR each row t1 in T1 {
        BOOL f1:=FALSE;
        FOR each row t2 in T2 such that P1(t1,t2) {
          BOOL f2:=FALSE;
          FOR each row t3 in T3 such that P2(t2,t3) {
            IF P(t1,t2,t3) {
              t:=t1||t2||t3; OUTPUT t;
            }
            f2=TRUE;
            f1=TRUE;
          }
          IF (!f2) {
            IF P(t1,t2,NULL) {
              t:=t1||t2||NULL; OUTPUT t;
            }
            f1=TRUE;
          }
        }
        IF (!f1) {
          IF P(t1,NULL,NULL) {
            t:=t1||NULL||NULL; OUTPUT t;
          }
        }
      }
 
 In general, for any nested loop for the first inner table in an outer
 join operation, a flag is introduced that is turned off before the loop
 and is checked after the loop. The flag is turned on when for the
 current row from the outer table a match from the table representing
 the inner operand is found. If at the end of the loop cycle the flag is
 still off, no match has been found for the current row of the outer
 table. In this case, the row is complemented by `NULL' values for the
 columns of the inner tables. The result row is passed to the final
 check for the output or into the next nested loop, but only if the row
 satisfies the join condition of all embedded outer joins.
 
 In our example, the outer join table expressed by the following
 expression is embedded:
 
      (T2 LEFT JOIN T3 ON P2(T2,T3))
 
 Note that for the query with inner joins, the optimizer could choose a
 different order of nested loops, such as this one:
 
      FOR each row t3 in T3 {
        FOR each row t2 in T2 such that P2(t2,t3) {
          FOR each row t1 in T1 such that P1(t1,t2) {
            IF P(t1,t2,t3) {
               t:=t1||t2||t3; OUTPUT t;
            }
          }
        }
      }
 
 For the queries with outer joins, the optimizer can choose only such an
 order where loops for outer tables precede loops for inner tables.
 Thus, for our query with outer joins, only one nesting order is
 possible. For the following query, the optimizer will evaluate two
 different nestings:
 
      SELECT * T1 LEFT JOIN (T2,T3) ON P1(T1,T2) AND P2(T1,T3)
        WHERE P(T1,T2,T3)
 
 The nestings are these:
 
      FOR each row t1 in T1 {
        BOOL f1:=FALSE;
        FOR each row t2 in T2 such that P1(t1,t2) {
          FOR each row t3 in T3 such that P2(t1,t3) {
            IF P(t1,t2,t3) {
              t:=t1||t2||t3; OUTPUT t;
            }
            f1:=TRUE
          }
        }
        IF (!f1) {
          IF P(t1,NULL,NULL) {
            t:=t1||NULL||NULL; OUTPUT t;
          }
        }
      }
 
 and:
 
      FOR each row t1 in T1 {
        BOOL f1:=FALSE;
        FOR each row t3 in T3 such that P2(t1,t3) {
          FOR each row t2 in T2 such that P1(t1,t2) {
            IF P(t1,t2,t3) {
              t:=t1||t2||t3; OUTPUT t;
            }
            f1:=TRUE
          }
        }
        IF (!f1) {
          IF P(t1,NULL,NULL) {
            t:=t1||NULL||NULL; OUTPUT t;
          }
        }
      }
 
 In both nestings, `T1' must be processed in the outer loop because it
 is used in an outer join.  `T2' and `T3' are used in an inner join, so
 that join must be processed in the inner loop.  However, because the
 join is an inner join, `T2' and `T3' can be processed in either order.
 
 When discussing the nested-loop algorithm for inner joins, we omitted
 some details whose impact on the performance of query execution may be
 huge. We did not mention so-called `pushed-down' conditions. Suppose
 that our `WHERE' condition `P(T1,T2,T3)' can be represented by a
 conjunctive formula:
 
      P(T1,T2,T2) = C1(T1) AND C2(T2) AND C3(T3).
 
 In this case, MySQL actually uses the following nested-loop schema for
 the execution of the query with inner joins:
 
      FOR each row t1 in T1 such that C1(t1) {
        FOR each row t2 in T2 such that P1(t1,t2) AND C2(t2)  {
          FOR each row t3 in T3 such that P2(t2,t3) AND C3(t3) {
            IF P(t1,t2,t3) {
               t:=t1||t2||t3; OUTPUT t;
            }
          }
        }
      }
 
 You see that each of the conjuncts `C1(T1)', `C2(T2)', `C3(T3)' are
 pushed out of the most inner loop to the most outer loop where it can
 be evaluated. If `C1(T1)' is a very restrictive condition, this
 condition pushdown may greatly reduce the number of rows from table
 `T1' passed to the inner loops. As a result, the execution time for the
 query may improve immensely.
 
 For a query with outer joins, the `WHERE' condition is to be checked
 only after it has been found that the current row from the outer table
 has a match in the inner tables. Thus, the optimization with pushing
 condition out of the inner nested loops cannot be applied directly to
 queries with outer joins. Here we have to introduce conditional
 pushed-down predicates guarded by the flags that are turned on when a
 match has been encountered.
 
 For our example with outer joins with:
 
      P(T1,T2,T3)=C1(T1) AND C(T2) AND C3(T3)
 
 the nested-loop schema using guarded pushed-down conditions looks like
 this:
 
      FOR each row t1 in T1 such that C1(t1) {
        BOOL f1:=FALSE;
        FOR each row t2 in T2
            such that P1(t1,t2) AND (f1?C2(t2):TRUE) {
          BOOL f2:=FALSE;
          FOR each row t3 in T3
              such that P2(t2,t3) AND (f1&&f2?C3(t3):TRUE) {
            IF (f1&&f2?TRUE:(C2(t2) AND C3(t3))) {
              t:=t1||t2||t3; OUTPUT t;
            }
            f2=TRUE;
            f1=TRUE;
          }
          IF (!f2) {
            IF (f1?TRUE:C2(t2) && P(t1,t2,NULL)) {
              t:=t1||t2||NULL; OUTPUT t;
            }
            f1=TRUE;
          }
        }
        IF (!f1 && P(t1,NULL,NULL)) {
            t:=t1||NULL||NULL; OUTPUT t;
        }
      }
 
 In general, pushed-down predicates can be extracted from join
 conditions such as `P1(T1,T2)' and `P(T2,T3)'. In this case, a
 pushed-down predicate is guarded also by a flag that prevents checking
 the predicate for the `NULL'-complemented row generated by the
 corresponding outer join operation.
 
 Note that access by key from one inner table to another in the same
 nested join is prohibited if it is induced by a predicate from the
 `WHERE' condition. (We could use conditional key access in this case,
 but this technique is not employed yet in MySQL 5.0.)
 
Info Catalog (mysql.info) left-join-optimization (mysql.info) query-speed (mysql.info) outer-join-simplification
automatically generated byinfo2html