My proposals for Percona Live: Window Functions and ANALYZE for statements

I’ve made two session proposals for Percona Live conference:

if you feel these talks are worth it, please vote!

Posted in mysql, mariadb, percona live on November 30th, 2015 by spetrunia | | 0 Comments

MariaDB 10.1: Better query optimization for ORDER BY … LIMIT

For some reason, we’ve been getting a lot of issues with ORDER BY optimization recently. The fixes have passed Elena Stepanova’s scrutiny and I’ve pushed them to MariaDB 10.1. Now, MariaDB’s ORDER BY ... LIMIT optimizer:

  • Doesn’t make stupid choices when several multi-part keys and potential range accesses are present (MDEV-6402)
  • Always uses “range” and (not full “index” scan) when it switches to an index to satisfy ORDER BY … LIMIT (MDEV-6657)
  • Tries hard to be smart and use cost/number of records estimates from other parts of the optimizer (MDEV-6384, MDEV-465, MySQL Bug#36817)
  • Takes full advantage of InnoDB’s Extended Keys feature when checking if filesort() can be skipped (MDEV-6796)

so, if you are struggling with poor optimization of ORDER BY ... LIMIT queries, help might be underway.

I have to admit that optimizations for ORDER BY... LIMIT are still far from perfect (both in MySQL and in MariaDB). The code is old and ugly and having spent some time debugging it, I can easily believe that it still does a poor job in some cases. The good part is that now we have some fresh knowledge of the ORDER BY … LIMIT optimization code, as well as RQG scripts for testing ORDER BY .. LIMIT queries. If you have any issues with ORDER BY LIMIT optimization, we will appreciate to see bugs filed for them.

Posted in mysql, mariadb on October 11th, 2014 by spetrunia | | 0 Comments

A discovery - Index Condition Pushdown can cause a slowdown after all

MariaDB 5.5 and then MySQL 5.6 got Index Condition Pushdown (ICP) optimization (initially coded by yours truly). The idea of ICP is simple: after reading the index record, check the part of WHERE condition that can be computed using index columns, and only then read the table record. That way, we avoid reading table rows that don’t satisfy index condition:

It seems apparent that ICP can never make things slower. The WHERE clause has to be checked anyway, and not reading certain records can only make things faster.

That was what I thought, too, until recently Joffrey Michaie observed the contrary “in the wild”: we’ve got a real-world case where using Index Condition Pushdown was slower than not using it: MDEV-6713. The slowdown was about 20%, both on MariaDB and MySQL.

From what I could investigate so far, the slowdown is caused by these three factors working together:

  • a VARCHAR(255) column in the index. MySQL’s in-memory data representation for VARCHARs is not space efficient. If a column is defined as VARCHAR(255), any value will occupy the entire 255 bytes.
  • InnoDB’s row prefetch cache. When InnoDB detects that one is reading a lot of rows from an index, it pre-fetches index records and stores them in an internal cache. The cache uses the inefficient in-memory data representation.
  • Design of MySQL’s Item classes. One can’t evaluate an expression over a table record that is in InnoDB prefetch cache. Expression evaluation functions expect to find column values in the table’s “primary location”, internally known as table->record[0]. In order for ICP to check the condition, index columns have to be copied to table->record[0], first.

I hope we will be able to investigate this problem and post more about it soon. For now, the news is that ICP can cause a slowdown, when the index has big VARCHAR columns.

Posted in mysql, mariadb on October 11th, 2014 by spetrunia | | 0 Comments

Priority queue optimization for filesort is now visible in MariaDB 10.0

TL;DR: Priority queue optimization for filesort with small LIMIT is now visible in MariaDB: there is a status variable and you can also see it in the slow query log (KB page link).

A longer variant:
One of the new optimizations in MySQL 5.6 is ability to use a priority queue instead of sorting for ORDER BY … LIMIT queries. The optimization was ported into MariaDB 10.0 long ago, but we still get questions if/when it will be ported. I guess, the reason for this is that, besides the query speed, you can’t see this optimization. Neither EXPLAIN, nor EXPLAIN FORMAT=JSON or PERFORMANCE_SCHEMA or status variables give any indication whether filesort used priority queue or the regular quicksort+merge algorithm.

MySQL 5.6 has only one way one can check whether filesort used priority queue. You need to enable optimizer_trace (set optimizer_trace=1), and then run the query (not EXPLAIN, but the query itself). Then, you can look into the optimizer trace and find something like this:

          "filesort_priority_queue_optimization": {
            "limit": 10,
            "rows_estimate": 198717,
            "row_size": 215,
            "memory_available": 262144,
            "chosen": true

MariaDB doesn’t support optimizer_trace at the moment. Even if it did, I think it would be wrong to require one to look into the optimizer trace to find out about the picked query plan.

The natural place to show the optimization would be EXPLAIN output. We could show something like “Using filesort (priority queue)”. This was my initial intent. After looking into the source code, this turned out to be difficult to do. The logic that makes the choice between using quicksort+merge and using priority queue is buried deep inside query execution code. (As if the mess caused by late optimizations of ORDER BY and UNIONs didn’t teach anybody in MySQL team anything).

As for query execution, there are two facilities where one could record execution-time details about the query plan. They are the status variables and the slow query log.

Status variables

We’ve added Sort_priority_queue_sorts status variable. Now, the list of sort-related status variables is:

MariaDB [test]> show status like 'Sort%';
| Variable_name             | Value |
| Sort_merge_passes         | 0     |
| Sort_priority_queue_sorts | 1     |
| Sort_range                | 0     |
| Sort_rows                 | 11    |
| Sort_scan                 | 1     |

(Sort_range + Sort_scan) gives total number of sorts. Sort_priority_queue_sorts gives number of sorts that were done using priority queue.

Slow query log

Percona’s Extended statistics in the slow query log shows Filesort/Filesort_on_disk fields. We thought that adding information about priority queue use would be appropriate. Now, slow query log entries look like this:

# Time: 140714 18:30:39
# User@Host: root[root] @ localhost []
# Thread_id: 3  Schema: test  QC_hit: No
# Query_time: 0.053857  Lock_time: 0.000188  Rows_sent: 11  Rows_examined: 100011
# Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: Yes
SET timestamp=1405348239;
select * from t1 where col1 between 10 and 20 order by col2 limit 100;

pt-query-digest is able to parse slow query logs with the new field.


What about PERFORMANCE_SCHEMA? After all, it is the most powerful tool for tracking query execution. It has “absorbed” some status variables into events_statements_history table. For sorting, it has these columns:

| SORT_MERGE_PASSES       | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_RANGE              | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_ROWS               | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_SCAN               | bigint(20) unsigned              | NO   |     | NULL    |       |

Should we add a SORT_PRIORITY_QUEUE_SORTS column there? We didn’t add it into 10.0 right now because of compatibility concerns. Some tools may rely on the structure of PERFORMANCE_SCHEMA tables. Also, PERFORMANCE_SCHEMA table definitions are stored on disk, and one would have to run mysql_fix_privilege_tables after a minor upgrade, which is not good.

Posted in EXPLAIN, mysql, mariadb on July 14th, 2014 by spetrunia | | 0 Comments

New feature in MariaDB 10.1: ANALYZE statement

Last week, yours truly has pushed a new feature into MariaDB 10.1 tree: ANALYZE statement.

The idea of this feature is to make it easy to compare query plan with query execution. ANALYZE statement will run the statement, and produce EXPLAIN-like output, where optimizer’s estimates are followed by numbers that were observed when running the query. The output looks like this:



  • Next to rows there is r_rows column which shows how many records were read from the table.
  • Next to filtered there is r_filtered column which shows which fraction of records was left after the part of the WHERE condition attached to the table was checked.

I think this should explain the feature. If you want more details, please refer to the KB article ANALYZE statement. It also discusses the meaning of the above EXPLAIN output.

Technical details and further plans

ANALYZE currently uses its own counters. Counting is done for all queries, including non-ANALYZE queries. This should be okay (not have visible overhead) as long as counting just increments integer variables in the query plan, without doing any atomic operations or making syscalls.

The upside of this approach is that it’s now trivial to make Explain in the slow query log also print ANALYZE output. When a query runs slowly, you will be able to know where exactly the optimizer was wrong.

The downside is that getting more data will not be as easy. So far, the most requested numbers beyond r_rows and r_filtered were r_time(amount of time spent in reading the table) and r_io(amount of IO that we did on the table). Counting the amount of time that was spent while reading each row will impose CPU overhead, it is a known problem. Counting IO is just incrementing a counter, but it will require interaction between ANALYZE code and storage engine(s) code, which will add complexity.

There is PERFORMANCE_SCHEMA feature, where others have already spent a lot of effort to count wait time and IO. It’s tempting to reuse it. The problem is, P_S collects the wrong data. P_S counters are global, while ANALYZE needs to count IO for each table reference separately. Consider a self-join. From P_S point of view, it is reading from the same table. From ANALYZE point of view, it is reads from two different table references. I’m currently not sure whether ANALYZE should/could rely on PERFORMANCE_SCHEMA.

A totally different angle is that tabular EXPLAIN output doesn’t allow to show much data (for example, how many rows were there before/after GROUP BY?). Here the solution is clear, I think: support EXPLAIN FORMAT=JSON and then add ANALYZE FORMAT=JSON where we can provide lots of detail.

Posted in EXPLAIN, mysql, mariadb on June 30th, 2014 by spetrunia | | 3 Comments

Optimizations for derived tables in MySQL 5.6 and MariaDB 5.5+

I had been involved with subquery optimizations fairly closely, but last week I was surprised to find out that MySQL 5.6 does not support derived table merging. This feature was among the subquery features in the abandoned MySQL 6.0. In MariaDB, it was finished and released as part of MariaDB 5.3/5.5. As for MySQL, neither MySQL 5.6, nor MySQL 5.7 has this feature.

So what is this “derived merge”? It’s simple to understand. When one writes complex queries, it is common to use FROM-clause subqueries as a way to structure the query:

  (select * from orders where o_orderpriority=’1-URGENT’) as high_prio_orders
  o_orderdate between ‘1995-01-01′ and ‘1995-01-07′

MySQL optimizer processes this syntax very poorly. The basic problem is that FROM-subqueries are always materialized exactly as-specified. Conditions from outside the subquery are applied only after the materialization.

In our example, table orders has an index on o_orderdate, and there is a highly selective condition o_orderdate BETWEEN ... which one can use for reading through the index. But the condition is located outside the subquery, so it will not be used when reading the table. Instead, we will get the following plan:

| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 1505799 | Using where |
|  2 | DERIVED     | orders     | ALL  | NULL          | NULL | NULL    | NULL | 1505799 | Using where |

The meaning of it is:

  1. Do a full table scan is on table `orders`. We expect to read 1.5M rows. Write rows that match o_orderpriority='1-URGENT' into a temporary table
  2. Read the temporary table back. Filter rows that match o_orderdate between ... and compute the query result

MySQL 5.6 has added some improvements to this (link to the manual). They are:

  • The temporary table is materialized as late as possible. This has no effect of the speed of our example query, but it may have an effect for more complex queries.
    • EXPLAIN also will not materialize the temporary table
  • The optimizer has an option to create and use an index on the temporary table.

However, the base problem of materializing FROM subquery before applying any other optimization still remains.

In MariaDB, EXPLAIN will be different:

| id   | select_type | table  | type  | possible_keys | key           | key_len | ref  | rows | Extra                              |
|    1 | SIMPLE      | orders | range | i_o_orderdate | i_o_orderdate | 4       | NULL | 4358 | Using index condition; Using where |

Note that we see only one line, and the table orders is accessed through an index on o_orderdate. Running EXPLAIN EXTENDED will show why:

Message: select sum(`dbt3sf1`.`orders`.`o_totalprice`) AS `sum(o_totalprice)` from `dbt3sf1`.`orders` where ((`dbt3sf1`.`orders`.`o_orderpriority` = ‘1-URGENT’) and (`dbt3sf1`.`orders`.`o_orderDATE` between ‘1995-01-01′ and ‘1995-01-07′))

There is no FROM-clause subquery anymore. It has been merged into the upper select. This allowed the optimizer to avoid doing materialization, and also to use the condition and index on o_orderdate to construct a range access.

Query execution time for this particular example went down from 15 sec to 0.25 sec, but generally, the difference can be as big as your table is big.

Posted in how-it-works, mysql, mariadb on June 30th, 2014 by spetrunia | | 0 Comments

Engineering behind EXPLAIN FORMAT=JSON (or lack thereof)

MySQL 5.6 has added support for EXPLAIN FORMAT=JSON. The basic use case for that feature is that one can look at the JSON output and see more details about the query plan. More advanced/specific use cases are difficult, though. The problem is, you can’t predict what EXPLAIN FORMAT=JSON will produce. There is no documentation or any kind of convention regarding the contents of JSON document that you will get.

To make sure I’m not missing something, I looked at MySQL Workbench. MySQL Workbench has a feature called Visual Explain. If you want to use, prepare to seeing this a lot:


In Workbench 6.1.4 you get it for (almost?) any query with subquery. In Workbench 6.1.6 (released last week), some subqueries work, but it’s still easy to hit a query whose EXPLAIN JSON output confuses workbench.

Looking at the source code, this seems to be just the start of it. The code in MySQL Server is not explicitly concerned with having output of EXPLAIN FORMAT=JSON conform to any convention. Workbench also has a rather ad-hoc “parser” that walks over JSON tree and has certain arbitrary expectations about what nodes should be in various parts of the JSON document. When these two meet, bugs are a certainty. I suspect the real fun will start after a few releases of the Server (fixing stuff and adding new features) and Workbench (trying to catch up with new server while supporting old ones).

My personal interest in all this is that we want to support EXPLAIN JSON in MariaDB. MariaDB optimizer has extra features, so we will have to extend EXPLAIN JSON. I was looking for a way to do it in a compatible way. However, current state of EXPLAIN JSON in MySQL doesn’t give one a chance.

Posted in EXPLAIN, mysql, mariadb on May 23rd, 2014 by spetrunia | | 1 Comments

An interesting case in ORDER BY LIMIT optimization

Recently, I was asked about an interesting case in ORDER BY … LIMIT optimization. Consider a table

create table tbl (
  KEY key1(col1, col2),
) engine=InnoDB;

Consider queries like:

  select * from tbl where col1=’foo’ and col2=123 order by pk limit 1;
  select * from tbl where col1=’bar’ and col2=123 order by pk limit 1;

These run nearly instantly. But, if one combines these two queries with col1='foo' and col1='bar' into one query with col1 IN ('foo','bar'):

  select * from tbl where col1 IN (’foo’,'bar’) and col2=123 order by pk limit 1;

then the query is be orders of magnitude slower than both of the queries with col1=const.

The first thing to note when doing investigation is to note that the table uses InnoDB engine, which has extended_keys feature. This means, the index

  KEY key1(col1, col2)

is actually

  KEY key1(col1, col2, pk)

Once you have that, and also you have col1='foo' AND col2=123 in the WHERE clause, the optimizer is able to see that index `key1` produces records ordered by the `pk` column, i.e. in the order required by the ORDER BY clause. This allows to satisfy the LIMIT 1 part by reading just one row.

Now, if we change col1='foo' into col1 IN('foo','bar'), we will still be able to use index `key1`, but the rows we read will not be ordered by `pk`. They will come in two ordered batches:

  'bar', 123, pkX
  'bar', 123, pkX+1
  'bar', 123, pkX+2
  'foo', 123, pkY
  'foo', 123, pkY+1
  'foo', 123, pkY+2

The query has ORDER BY pk LIMIT 1, but, since the rowset is not ordered by pk, the optimizer will have to read all of the rows, sort them, and find the row with the least value of `pk`.

Now, wouldn’t it be great if the optimizer was aware that the index scan returns two ordered batches? It would be able to read not more than #LIMIT rows from each batch. I can think of two possible execution strategies:

  1. Run something similar to index_merge strategy: start an index scan col1='foo' and an index scan on col1='bar'. Merge the two ordered streams until we’ve found #limit rows. This approach works well when you’re merging a few streams. If there are a lot of streams, the overhead of starting concurrent index scans will start to show up.
  2. Use the same index cursor to #LIMIT rows from the first batch, then from the second, and so forth. Merge these ordered streams using filesort’s merge pass or priority queue. This approach reads more rows than the first one, but we don’t have to create another index cursor.

Now, the question is whether this kind of queries is frequent enough to implement this optimization.

Posted in ideas, mysql, mariadb on May 14th, 2014 by spetrunia | | 0 Comments

Comparing query optimizer features in MariaDB 10.0 and MySQL 5.6

MariaDB 10.0 had a stable release last month. It is a good time to take a look and see how it compares to the stable version of MySQL, MySQL 5.6 (as for Percona Server, it doesn’t have its own optimizer features).
Changelogs and release notes have all the details, but it’s difficult to see the big picture. So I went for diagrams, and the result is a short article titled What is the difference between MySQL and MariaDB query optimizers. It should give one a clue about what are the recent developments in query optimizers in MySQL world.

In case you’re interested in details about optimizer features in MariaDB 10.0, I’ve shared slides from a talk about MariaDB 10.0 query optimizer.

Posted in mysql, slides, mariadb on May 13th, 2014 by spetrunia | | 0 Comments