Engineering behind EXPLAIN FORMAT=JSON (or lack thereof)

MySQL 5.6 has added support for EXPLAIN FORMAT=JSON. The basic use case for that feature is that one can look at the JSON output and see more details about the query plan. More advanced/specific use cases are difficult, though. The problem is, you can’t predict what EXPLAIN FORMAT=JSON will produce. There is no documentation or any kind of convention regarding the contents of JSON document that you will get.

To make sure I’m not missing something, I looked at MySQL Workbench. MySQL Workbench has a feature called Visual Explain. If you want to use, prepare to seeing this a lot:

visual-explain-error.png

In Workbench 6.1.4 you get it for (almost?) any query with subquery. In Workbench 6.1.6 (released last week), some subqueries work, but it’s still easy to hit a query whose EXPLAIN JSON output confuses workbench.

Looking at the source code, this seems to be just the start of it. The code in MySQL Server is not explicitly concerned with having output of EXPLAIN FORMAT=JSON conform to any convention. Workbench also has a rather ad-hoc “parser” that walks over JSON tree and has certain arbitrary expectations about what nodes should be in various parts of the JSON document. When these two meet, bugs are a certainty. I suspect the real fun will start after a few releases of the Server (fixing stuff and adding new features) and Workbench (trying to catch up with new server while supporting old ones).

My personal interest in all this is that we want to support EXPLAIN JSON in MariaDB. MariaDB optimizer has extra features, so we will have to extend EXPLAIN JSON. I was looking for a way to do it in a compatible way. However, current state of EXPLAIN JSON in MySQL doesn’t give one a chance.

Posted in EXPLAIN, mysql, mariadb on May 23rd, 2014 by spetrunia | | 1 Comments

An interesting case in ORDER BY LIMIT optimization

Recently, I was asked about an interesting case in ORDER BY … LIMIT optimization. Consider a table

create table tbl (
  …
  KEY key1(col1, col2),
  PRIMARY KEY (pk)
) engine=InnoDB;

Consider queries like:

  select * from tbl where col1=’foo’ and col2=123 order by pk limit 1;
  select * from tbl where col1=’bar’ and col2=123 order by pk limit 1;

These run nearly instantly. But, if one combines these two queries with col1='foo' and col1='bar' into one query with col1 IN ('foo','bar'):

  select * from tbl where col1 IN (’foo’,'bar’) and col2=123 order by pk limit 1;

then the query is be orders of magnitude slower than both of the queries with col1=const.

The first thing to note when doing investigation is to note that the table uses InnoDB engine, which has extended_keys feature. This means, the index

  KEY key1(col1, col2)

is actually

  KEY key1(col1, col2, pk)

Once you have that, and also you have col1='foo' AND col2=123 in the WHERE clause, the optimizer is able to see that index `key1` produces records ordered by the `pk` column, i.e. in the order required by the ORDER BY clause. This allows to satisfy the LIMIT 1 part by reading just one row.

Now, if we change col1='foo' into col1 IN('foo','bar'), we will still be able to use index `key1`, but the rows we read will not be ordered by `pk`. They will come in two ordered batches:

  'bar', 123, pkX
  'bar', 123, pkX+1
  'bar', 123, pkX+2
  ...
  'foo', 123, pkY
  'foo', 123, pkY+1
  'foo', 123, pkY+2

The query has ORDER BY pk LIMIT 1, but, since the rowset is not ordered by pk, the optimizer will have to read all of the rows, sort them, and find the row with the least value of `pk`.

Now, wouldn’t it be great if the optimizer was aware that the index scan returns two ordered batches? It would be able to read not more than #LIMIT rows from each batch. I can think of two possible execution strategies:

  1. Run something similar to index_merge strategy: start an index scan col1='foo' and an index scan on col1='bar'. Merge the two ordered streams until we’ve found #limit rows. This approach works well when you’re merging a few streams. If there are a lot of streams, the overhead of starting concurrent index scans will start to show up.
  2. Use the same index cursor to #LIMIT rows from the first batch, then from the second, and so forth. Merge these ordered streams using filesort’s merge pass or priority queue. This approach reads more rows than the first one, but we don’t have to create another index cursor.

Now, the question is whether this kind of queries is frequent enough to implement this optimization.

Posted in ideas, mysql, mariadb on May 14th, 2014 by spetrunia | | 0 Comments

Comparing query optimizer features in MariaDB 10.0 and MySQL 5.6

MariaDB 10.0 had a stable release last month. It is a good time to take a look and see how it compares to the stable version of MySQL, MySQL 5.6 (as for Percona Server, it doesn’t have its own optimizer features).
Changelogs and release notes have all the details, but it’s difficult to see the big picture. So I went for diagrams, and the result is a short article titled What is the difference between MySQL and MariaDB query optimizers. It should give one a clue about what are the recent developments in query optimizers in MySQL world.

In case you’re interested in details about optimizer features in MariaDB 10.0, I’ve shared slides from a talk about MariaDB 10.0 query optimizer.

Posted in mysql, slides, mariadb on May 13th, 2014 by spetrunia | | 0 Comments