An idea how to speed up nested-loops join a little

Working on subquery optimizations, got an idea how to speed up join execution a little. Read on.

The idea

Consider a query:

select * from t1, t2, t3 where t3.key=t1.col1 and t2.key=t1.col2

Suppose the join order is t1, t2, t3, i.e. the EXPLAIN is like

+-------+------+---------------+------+---------+--------------+-..
| table | type | possible_keys | key  | key_len | ref          |
+-------+------+---------------+------+---------+--------------+-..
| t1    | ALL  | NULL          | NULL | NULL    | NULL         |
| t2    | ref  | key           | key  | 5       | test.t1.col1 |
| t3    | ref  | key           | key  | 5       | test.t1.col2 |
+-------+------+---------------+------+---------+--------------+-..

The important property is that access to t3 is independent of access to t2. MySQL’s nested loops join algorithm will run this as in this swimlane diagram:

NL execution diagram

Here we assume that

  • table t2 has 4 rows such that t2.key=t1.col1
  • table t3 doesn’t have any rows such that t3.key=t1.col2

As soon as we make first index lookup in table t3 (the one with blue border), we see that there will be no matching row combinations for this row of t1. Nevertheless, MySQL executioner will proceed to examine different rows in table t2 (marked with red border). This is redundant and can be easily avoided.

The implementation

The executioner part is easy: just maked the nested-loops join code to “jump back” in the cases like the illustrated. If we don’t find a match and table access/selection condition does not depend on the preceding table(s), then go back to the last table that the table access depends on. I don’t have the ready term for this, the working name is “short-cutting”.

The optimizer part is (as usual) more complicated. One way is to take the easy route: let the join optimizer (the part of code that chooses the join order) remain unaware of the short-cutting. Once the join order is produced, set up the executioner to perform short-cutting where appopriate.

The bad part is that join optimizer doesn’t account for short-cutting when comparing costs of various join orders, which may lead it to choose non-optimal join orders. However, taking short-cutting into account won’t be easy:

  • First, we’ll need to generate selection conditions for the join orders we consider (in other words, figure out if the EXPLAIN will have “Using WHERE”). Our current way of doing this will probably be too expensive to be done for each considered join order.
  • Second, we’ll need to get somewhere an estimate of how often short-cutting will occur. That is, we’ll need to know probability that for some *arbitrary* values of columns from preceding tables, table T will have no rows that would satisfy given condition COND(preceding_tables, T). This estimate is likely to end up being some while guess like “lets use
    0.5″.

At the moment the easy route seems like the way to go.

2007-02-11: Fixed typos in EXPLAIN ouput

Posted in ideas on January 29th, 2007 by spetrunia | | 5 Comments