Partition pruning tip: Use comparisons of column, not of partitioning function value

A customer issue has drawn my attention to this this pecularity: if partitioning is done by the value of some function, then partition pruning module will make use of comparisons of the partitioning column(s), but not of comparisons of the value of the partitioning function. Here is an example:

CREATE TABLE t1 ( recdate  DATETIME NOT NULL, ... )
PARTITION BY RANGE( TO_DAYS(recdate) ) (
  PARTITION p0 VALUES LESS THAN ( TO_DAYS('2007-01-01') ),
  PARTITION p1 VALUES LESS THAN ( TO_DAYS('2007-02-01') ),
  ...
);
EXPLAIN PARTITIONS SELECT * FROM t1 WHERE recdate='2007-01-15';
+----+-------------+-------+------------+------+-...
| id | select_type | table | partitions | type |
+----+-------------+-------+------------+------+-...
|  1 | SIMPLE      | t1    | p1         | ALL  |
+—-+————-+——-+————+——+-…

EXPLAIN PARTITIONS SELECT * FROM t1 WHERE TO_DAYS(recdate)=TO_DAYS(’2007-01-15′);
+—-+————-+——-+—————-+——+-…
| id | select_type | table | partitions     | type |
+—-+————-+——-+—————-+——+-…
|  1 | simple      | t1    | p0,p1,p2,p3,p4 | all  |
+—-+————-+——-+—————-+——+-…

This also applies to cases where partitioning function is a function of multiple columns. This gives us this tip:


For partition pruning to work, the WHERE clause must use comparisons of columns used by partitioning function, not of the result value of the partitioning function.

This doesn’t look like a totally natural limitation, but that is what is in effect for the time being.

The impact of this limitation and whether it can be lifted

The limitation can be easily worked around in cases where partitioning function is unary and monotonic, i.e where one can easily convert back and forth between comparison of partitioning function:

  monotonic_part_func(const1) < monotonic_part_func(part_col) < monotonic_part_func(const2)

and comparison of partitioning column:

const1 < part_col < const2

Partition pruning module can do such conversions, too, so it is not difficult to make MySQL handle such cases automatically. The case with non-monotonic or n-ary partitioning function is more difficult. Partition pruning uses range analysis over column ordering, that is, it analyzes the WHERE clause, converts it to a disjoint list of intervals like:

(a < part_col < b) OR (part_col = c) OR ...

and then finds which partitions those intervals fall into. We could switch to part_func(part_col) ordering, i.e. collect intervals like

(a < part_func(part_col) < b) OR (part_func(part_col) = c) OR ...

but this will make it hard to handle predicates like “a < part_col < b” (remember I’m talking about non-monotonic case, where X < Y does not mean that part_func(X) < part_func(Y)).

We could do two partition pruning passes, one with part_col ordering and the other with part_func(part_col) ordering, but that is slow, ugly, and will not handle cases like

part_col=const1 OR part_func(part_col)=const2

We could stop doing the range analysis altogether and operate on sets of used partitions, that is, recursively process the WHERE clause using a set of rules like this:


# AND/OR recursion
partitions(cond1 AND cond2) := intersect(partitions(cond1), partitions(cond2))
partitions(cond1 OR cond2) := union(partitions(cond1), partitions(cond2))

# Various atomic predicates
partitions(part_col = C) := { partition_no(part_func(C)) }
partitions(part_col < C) := ...
...

but that will be slow, especially when there are many partitions. Also this technique will work poorly (or get very complicated) in cases where partitioning function uses several columns, and comparisons of those columns are on different AND/OR branches in the WHERE clause.

Neither solution is prefect, and each of them adds some overhead. It seems we’ll need to collect some user input before we decide what (if anything) we should do.

Posted in partitioning on May 27th, 2007 by spetrunia | | 1 Comments

Use of join buffer is now visible in EXPLAIN

UPDATE:
* s/Using join cache/Using join buffer/, changed to show the final variants of EXPLAIN output as described here
* s/join_buff_size/join_buffer_size/



Starting from 5.1.18, EXPLAIN output may show “Using join cache“, like in this example:

mysql> explain select * from t1, t2 where t1.col < 10 and t2.col < 'bar';
+----+-------------+-------+-------+-...-+--------------------------------+
| id | select_type | table | type  |     | Extra                          |
+----+-------------+-------+-------+-...-+--------------------------------+
|  1 | SIMPLE      | t1    | range |     | Using where                    |
|  1 | SIMPLE      | t2    | range |     | Using where; Using join buffer |
+—-+————-+——-+——-+-…-+——————————–+

The join cache is actually not a new feature. It has been available in MySQL at least since version 4.0, and for all this time it has remained invisible and undocumented. The only thing that indicated its presense was the @@join_buffer_size server variable.

We’re trying to gradually make EXPLAIN show more information. Georgi Kodinov was fixing BUG#27531 and has used that occasion to make join buffering show up in EXPLAIN output.

If you already know how MySQL’s join buffering works, that’s all the news. If not, the remainder of this post has a hopefully readable explanation of how join buffering works and when it is used.

How join buffering works

Let’s start with regular Nested Loops Join. Suppose we have a join query

select * from t1, t2, t3 where t2.key1=t1.col1 and t3.key1<40;

and the query plain is like shown in this EXPLAIN output:

...-+-------+-------+---------------+------+---------+--------------+------+-------------+
    | table | type  | possible_keys | key  | key_len | ref          | rows | Extra       |
...-+-------+-------+---------------+------+---------+--------------+------+-------------+
    | tbl1  | ALL   | NULL          | NULL | NULL    | NULL         |   10 |             |
    | tbl2  | ref   | key1          | key1 | 5       | db.tbl1.col1 |    2 | Using where |
    | tbl3  | range | key1          | key1 | 5       | NULL         |   40 | Using where |
...-+-------+-------+---------------+------+---------+--------------+------+-------------+

When no join buffering is used, the query will be executed as follows:

  for each record t1rec in table tbl1
  {
    for each record t2rec in tbl2 such that t2rec.key1=t1rec.col
    {
      for each record t3rec in tbl3 such that t3rec.key1<40
      {
        pass the (t1rec, t2rec, t3rec) row combination to output;
      }
    }
  }

Graphically the execution flow can be depicted as follows (yellow are the table scans, blue are the table rows):

nl-join-no-buffering.png

From the code and picture we see that:

  • Table tbl2 is scanned several times, but each scan accesses a different part of the table
  • Table tbl3 is scanned many times, and all performed scans are identical

It is apparent that the second and the third scans of table tbl3 bring no new information and can be removed. We do not have to re-scan tbl3 for any row combination from tables tbl1, tbl2. Instead, we could accumulate a back of such row combination, and then do one tbl3 scan for all of them. And this is what join buffering is.

In pseudo-code, the execution will look as follows:

  for each record t1rec in table tbl1
  {
    for each record t2rec in tbl2 such that t2rec.key1=t1rec.col
    {
      put (t1rec, t2rec) into the buffer
      if (buffer is full)
        flush_buffer();
    }
  }

  flush_buffer() {
    for each record t3rec in tbl3 such that t3rec.key1<40
    {
      for each record in the buffer
        pass (t1rec, t2rec, t3rec) row combination to output;
    }
    empty the buffer;
  }

And graphically it will look as follows:

nl-join-buffering.png

The EXPLAIN output will be as follows:

explain select * from t1,t2,t3 where t2.key1 = t1.col1 and t3.key1<40;
...-+-------+-------+---------------+------+---------+--------------+------+--------------------------------+
    | table | type  | possible_keys | key  | key_len | ref          | rows | Extra                          |
...-+-------+-------+---------------+------+---------+--------------+------+--------------------------------+
    | t1    | ALL   | NULL          | NULL | NULL    | NULL         |   10 |                                |
    | t2    | ref   | key1          | key1 | 5       | test.t1.col1 |    2 | Using where                    |
    | t3    | range | key1          | key1 | 5       | NULL         |   40 | Using where; Using join buffer |
…-+——-+——-+—————+——+———+————–+——+——————————–+

In this example join buffering is used for one table, but it can be used for several tables as well. MySQL uses join buffering whenever it can, access to some table tbl_x will be bufferred if

  • The SELECT does not have an ORDER BY clause
  • We’re not at top level “select” of a multi-table UPDATE
  • tbl_x is accessed using an “independent” access method: ALL, index, range, or index_merge.
  • tbl_x is not inner w.r.t. some outer join

The server variable @@join_buffer_size specifies how much MySQL should allocate for each buffer. That is, if two tables use buffering, MySQL will allocate two buffers of @@join_buffer_size bytes each.

Posted in EXPLAIN, how-it-works on May 16th, 2007 by spetrunia | | 6 Comments

Thoughts on partitioning optimizations

I’m supposed to be working on subquery optimization but I can’t get BUG#26630 out of my head.

The bug test case is small and easy to understand: looking at the EXPLAIN:

create table tbl_test( ... ) partition by range(num) (...);

explain partitions
select * from tbl_co c straight_join tbl_test t where t.num=c.num and reg=8;
..+-------+-------------+------+---------------+------+---------+------+------+-------------+
  | table | partitions  | type | possible_keys | key  | key_len | ref  | rows | Extra       |
..+-------+-------------+------+---------------+------+---------+------+------+-------------+
  | c     | NULL        | ALL  | NULL          | NULL | NULL    | NULL |   17 | Using where |
  | t     | p0,p1,p2,p3 | ALL  | NULL          | NULL | NULL    | NULL |   17 | Using where |
..+-------+-------------+------+---------------+------+---------+------+------+-------------+

we see that for table t no partition pruning is performed. While it is apparent that MySQL could do this (pieces in red font indicate what’s missing):

  for each row in table c
  {
     $p= <find partition that has rows such that t.num = c.num>;
     for each row in table t, within partition $p
       if (where clause matches)
         pass row combination to output;
  }

Partition Selection does something similar, but it will work only if there is an INDEX(t.num) and the optimizer choses to access table t via ref(t.num=c.num). This doesn’t hold in our case, so all partitions will be scanned every time.

Ok, now on to the general thoughts. MySQL has two kinds of partitioning optimizations:

  1. Partition pruning is performed statically, i.e. we look at the query and infer a set of partitions that do not need to be accessed, no matter which query execution plan will be used.
  2. Partition selection is the opposite: when we’re executing the query, and do an index lookup on
      tbl.keypart1=const1 AND tbl.keypart2=const2 AND ... AND tbl.keypartN=constN
    

    we check if those equalities allow us to determine one single partition that needs to be accessed. If yes, we access only that partition.

That is, we have

Partition pruning
  • static (can use predicates that depend only on this table)
  • thorough predicate analysis (can use >, <, BETWEEN, etc)
Partition selection
  • dynamic (can use predicates like tbl.x=tbl2.y)
  • can use equalities only

This dualism has its direct counterpart in MySQL table access methods. If we ignore “special” methods like fulltext and loose index scan, we have

range/index_merge
  • “static” access methods (can use predicates that depend only on this table)
  • thorough predicate analysis
ref-family methods
  • dynamic (can use predicates like tbl.x=tbl2.y)
  • can use equalities only

In fact, Partition Pruning is performed by creating a fake “index description”, running range/index_merge analyzer and then analyzing the obtained ranges.

Partition Selection could have been implemented in a similar way by re-using the ref-family methods analyzer: we could create a fake index description, run the analyzer, and then check if there are any potential ref accesses that use that index. If we have a ref-access candidate on

  tbl.partition_col = some_expresion(some_tables)

then we will know that we only need to access one partition, and we’ll know how to find out which. This solution is better than Partition Selection because

  • The requirement that we use an index that covers all partitioned columns will be lifted
  • The optimizer will know that only one partition will be accessed (currently this is discovered only at runtime) and will be able to take this into account
    • and it will be easy to show it in EXPLAIN, too
  • Architecturally, everything will look very nice:
    Engine User
    range/index_merge analyzer partition pruning
    ref-family analyzer partition selection

Having written all this I now realize that the example of BUG#26630 will probably not be resolved - it will still have to use all partitions because of the automatic unconditional use of “join buffering”. Well, hopefully the reporter does not really intend to run cross joins and has some will be satisfied with ability to use indexes that do not cover all partitioned fields.

Rant

Now a small rant. It seems all this is an example of Conway’s Law:

  1. Partition pruning was designed/implemented by people who were familiar with range/index_merge analyzer. Hence the reuse.
  2. Partition selection was designed/implemented by people who I beleive were not familiar with ref-family analyzer. Hence, no reuse. They were familiar with table handler interface and so partition selection went into the handler interface.

Funny.

Posted in partitioning on May 3rd, 2007 by spetrunia | | 5 Comments