Optimizations for derived tables in MySQL 5.6 and MariaDB 5.5+

I had been involved with subquery optimizations fairly closely, but last week I was surprised to find out that MySQL 5.6 does not support derived table merging. This feature was among the subquery features in the abandoned MySQL 6.0. In MariaDB, it was finished and released as part of MariaDB 5.3/5.5. As for MySQL, neither MySQL 5.6, nor MySQL 5.7 has this feature.

So what is this “derived merge”? It’s simple to understand. When one writes complex queries, it is common to use FROM-clause subqueries as a way to structure the query:

  (select * from orders where o_orderpriority=’1-URGENT’) as high_prio_orders
  o_orderdate between ‘1995-01-01′ and ‘1995-01-07′

MySQL optimizer processes this syntax very poorly. The basic problem is that FROM-subqueries are always materialized exactly as-specified. Conditions from outside the subquery are applied only after the materialization.

In our example, table orders has an index on o_orderdate, and there is a highly selective condition o_orderdate BETWEEN ... which one can use for reading through the index. But the condition is located outside the subquery, so it will not be used when reading the table. Instead, we will get the following plan:

| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 1505799 | Using where |
|  2 | DERIVED     | orders     | ALL  | NULL          | NULL | NULL    | NULL | 1505799 | Using where |

The meaning of it is:

  1. Do a full table scan is on table `orders`. We expect to read 1.5M rows. Write rows that match o_orderpriority='1-URGENT' into a temporary table
  2. Read the temporary table back. Filter rows that match o_orderdate between ... and compute the query result

MySQL 5.6 has added some improvements to this (link to the manual). They are:

  • The temporary table is materialized as late as possible. This has no effect of the speed of our example query, but it may have an effect for more complex queries.
    • EXPLAIN also will not materialize the temporary table
  • The optimizer has an option to create and use an index on the temporary table.

However, the base problem of materializing FROM subquery before applying any other optimization still remains.

In MariaDB, EXPLAIN will be different:

| id   | select_type | table  | type  | possible_keys | key           | key_len | ref  | rows | Extra                              |
|    1 | SIMPLE      | orders | range | i_o_orderdate | i_o_orderdate | 4       | NULL | 4358 | Using index condition; Using where |

Note that we see only one line, and the table orders is accessed through an index on o_orderdate. Running EXPLAIN EXTENDED will show why:

Message: select sum(`dbt3sf1`.`orders`.`o_totalprice`) AS `sum(o_totalprice)` from `dbt3sf1`.`orders` where ((`dbt3sf1`.`orders`.`o_orderpriority` = ‘1-URGENT’) and (`dbt3sf1`.`orders`.`o_orderDATE` between ‘1995-01-01′ and ‘1995-01-07′))

There is no FROM-clause subquery anymore. It has been merged into the upper select. This allowed the optimizer to avoid doing materialization, and also to use the condition and index on o_orderdate to construct a range access.

Query execution time for this particular example went down from 15 sec to 0.25 sec, but generally, the difference can be as big as your table is big.

Posted in how-it-works, mysql, mariadb on June 30th, 2014 by spetrunia | | 0 Comments

Slides: New subquery optimizations in MySQL 6.0 (new revision)

I’m now at MySQL Conference and have just finished my New Subquery Optimizations in MySQL 6.0 talk. If you’re eager to get the slides, here is the link to the pdf file and a mirror.

The talk has the same name as this MySQL University session but the slides have been thoroughly re-worked, and also there is new content:

  • More detailed explanation of subquery handling in 4.1/5.x
  • More reliable/justified benchmark numbers
  • Observations about subquery demographics
  • An attempt at comparative analysis of how MySQL’s subquery strategies compare to PostgreSQL’s
    • And a slide #28 that explains why we’re looking at PostgreSQL all the time

Have a nice viewing. And, if you get subqueries that should be fast but aren’t, we’re now actively looking at every subquery performance bug we get and will appreciate any input.

Posted in subqueries, how-it-works, benchmarks on April 16th, 2008 by spetrunia | | 0 Comments

Slides: New subquery optimizations in MySQL 6.0

A bunch of (hopefully) self-explanatory slides about new subquery optimizations in MySQL 6.0 is available here (UPDATE: here’s a working link). The slides are from this MySQL University session, so there was an audio stream but there were some difficulties with it and it is not available now.

MySQL Conference & Expo 2008