Pager script for shrinking EXPLAIN output

Everyone who works with MySQL (or MariaDB) query optimizer has to spend a lot of time looking at EXPLAIN outputs. You typically first look at the tabular form, because it is easier to read. You can immediately see what the join order is, what numbers of records will be read, etc:

MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey;
+------+-------------+----------+------+---------------+-------------+---------+----------------------------+--------+-------+
| id   | select_type | table    | type | possible_keys | key         | key_len | ref                        | rows   | Extra |
+------+-------------+----------+------+---------------+-------------+---------+----------------------------+--------+-------+
|    1 | SIMPLE      | customer | ALL  | PRIMARY       | NULL        | NULL    | NULL                       | 150303 |       |
|    1 | SIMPLE      | orders   | ref  | i_o_custkey   | i_o_custkey | 5       | dbt3sf1.customer.c_custkey |      7 |       |
+------+-------------+----------+------+---------------+-------------+---------+----------------------------+--------+-------+

The only problem is that it quickly gets too wide and doesn’t fit even on wide screens. To relieve the pain, I wrote the script that shrinks EXPLAIN output by removing spaces and less useful information. You set the script as mysql client pager command:

MariaDB [dbt3sf1]> \P shrink_explain.pl
PAGER set to ’shrink_explain.pl’

and then the above EXPLAIN becomes shorter:

MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey;
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|id|select_type|table   |type|possible_keys|key        |key_len|ref               |rows  |Extra|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|1 |SIMPLE     |customer|ALL |PRIMARY      |NULL       |NULL   |NULL              |150303|     |
|1 |SIMPLE     |orders  |ref |i_o_custkey  |i_o_custkey|5      |customer.c_custkey|7     |     |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+

The worst horizontal space hog is the possible_keys column:

MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey and
    -> o_orderkey in (select l_orderkey from lineitem where l_shipdate='1995-01-01');
+------+--------------------+----------+----------------+-----------------------------------------------------------------------------------------------+-------------+---------+----------------------------+--------+-------------+
| id   | select_type        | table    | type           | possible_keys                                                                                 | key         | key_len | ref                        | rows   | Extra       |
+------+--------------------+----------+----------------+-----------------------------------------------------------------------------------------------+-------------+---------+----------------------------+--------+-------------+
|    1 | PRIMARY            | customer | ALL            | PRIMARY                                                                                       | NULL        | NULL    | NULL                       | 150303 |             |
|    1 | PRIMARY            | orders   | ref            | i_o_custkey                                                                                   | i_o_custkey | 5       | dbt3sf1.customer.c_custkey |      7 | Using where |
|    2 | DEPENDENT SUBQUERY | lineitem | index_subquery | PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity,l_shipdate_partkey,l_shipdate_suppkey | PRIMARY     | 4       | func                       |      2 | Using where |
+------+--------------------+----------+----------------+-----------------------------------------------------------------------------------------------+-------------+---------+----------------------------+--------+-------------+

I have added an option to aggressively shrink the possible_keys column:

MariaDB [dbt3sf1]> \P shrink_explain.pl –shrink-possible-keys
PAGER set to ’shrink_explain.pl –shrink-possible-keys’
MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey and
         ->  o_orderkey in (select l_orderkey from lineitem where l_shipdate=’1995-01-01′);
+--+---------------+--------+--------------+-------------+-----------+-------+------------------+------+-----------+
|id|select_type    |table   |type          |possible_keys|key        |key_len|ref               |rows  |Extra      |
+--+---------------+--------+--------------+-------------+-----------+-------+------------------+------+-----------+
|1 |PRIMARY        |customer|ALL           |PRIMARY      |NULL       |NULL   |NULL              |150303|           |
|1 |PRIMARY        |orders  |ref           |i_o_custkey  |i_o_custkey|5      |customer.c_custkey|7     |Using where|
|2 |DEPENDENT SUBQ.|lineitem|index_subquery|PRIMARY,i_...|PRIMARY    |4      |func              |2     |Using where|
+--+---------------+--------+--------------+-------------+-----------+-------+------------------+------+-----------+

and there is an option to remove possible_keys altogether:

MariaDB [dbt3sf1]> \P shrink_explain.pl –remove-possible-keys
PAGER set to ’shrink_explain.pl –remove-possible-keys’
MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey and
         -> o_orderkey in (select l_orderkey from lineitem where l_shipdate=’1995-01-01′);
+--+---------------+--------+--------------+---+-----------+-------+------------------+------+-----------+
|id|select_type    |table   |type          |...|key        |key_len|ref               |rows  |Extra      |
+--+---------------+--------+--------------+---+-----------+-------+------------------+------+-----------+
|1 |PRIMARY        |customer|ALL           |...|NULL       |NULL   |NULL              |150303|           |
|1 |PRIMARY        |orders  |ref           |...|i_o_custkey|5      |customer.c_custkey|7     |Using where|
|2 |DEPENDENT SUBQ.|lineitem|index_subquery|...|PRIMARY    |4      |func              |2     |Using where|
+--+---------------+--------+--------------+---+-----------+-------+------------------+------+-----------+

The script can be downloaded here: shrink_explain.pl.

Posted in Uncategorized, mysql on November 23rd, 2013 by spetrunia | | 0 Comments

SHOW EXPLAIN in MariaDB 10.0 vs EXPLAIN FOR CONNECTION in MySQL 5.7

MariaDB 10.0 has SHOW EXPLAIN feature. It was coded by yours truly and first introduced about a year ago. Last release of MySQL 5.7 introduced EXPLAIN FOR CONNECTION, which looks rather similar to MariaDB’s SHOW EXPLAIN. I was wondering how these two compare.

The basics

The usage scenarios are similar. In both cases, you first need a connection id of a running query. It is typically obtained by running SHOW PROCESSLIST:

MariaDB [(none)]> show processlist;
+----+------+-----------+----------+---------+------+--------------+-----------------------------+----------+
| Id | User | Host      | db       | Command | Time | State        | Info                        | Progress |
+----+------+-----------+----------+---------+------+--------------+-----------------------------+----------+
|  2 | root | localhost | dbt3sf10 | Query   |    2 | Sending data | select count(*) from orders |    0.000 |
|  3 | root | localhost | NULL     | Query   |    0 | init         | show processlist            |    0.000 |
+----+------+-----------+----------+---------+------+--------------+-----------------------------+----------+

Then, in MariaDB, you run:

MariaDB [(none)]> show explain for 2;
+------+-------------+--------+-------+---------------+---------------+---------+------+----------+-------------+
| id   | select_type | table  | type  | possible_keys | key           | key_len | ref  | rows     | Extra       |
+------+-------------+--------+-------+---------------+---------------+---------+------+----------+-------------+
|    1 | SIMPLE      | orders | index | NULL          | i_o_orderdate | 4       | NULL | 14856637 | Using index |
+------+-------------+--------+-------+---------------+---------------+---------+------+----------+-------------+

and you get the EXPLAIN for the query that connection #2 is running. SHOW EXPLAIN always generates a warning, the warning has the text of the query that the EXPLAIN is for:

MariaDB [(none)]> show warnings;
+-------+------+-----------------------------+
| Level | Code | Message                     |
+-------+------+-----------------------------+
| Note  | 1003 | select count(*) from orders |
+-------+------+-----------------------------+

The idea behind this was that without the warning, it will be difficult to tell which query this EXPLAIN belongs to. It could be that the query you saw in SHOW PROCESSLIST has finished and another one has started.

SHOW EXPLAIN works for any query that EXPLAIN works for (starting from MariaDB 10.0.5 EXPLAIN UPDATE/DELETE/INSERT are supported). If you attempt to run SHOW EXPLAIN on a connection that is not running a statement that has EXPLAIN, you will get:

MariaDB [(none)]> show explain for 2;
ERROR 1933 (HY000): Target is not running an EXPLAINable command

Now, let’s take a look at MySQL’s EXPLAIN FOR CONNECTION:

MySQL [(none)]> explain for connection 1;
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | orders | ALL  | NULL          | NULL | NULL    | NULL | 922880 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+

Looks very similar to SHOW EXPLAIN, but there is no warning with query text. The only way out I can see is to run SHOW PROCESSLIST again, find your query there, and look at the “Time” column. If the value is big enough, this means that the query you see in SHOW PROCESSLIST output was already running when you ran SHOW EXPLAIN.

If the target connection is not running a query, you will get nothing

MySQL [(none)]> explain  for connection 1;
Query OK, 0 rows affected (0.00 sec)

More details

Unlike SHOW EXPLAIN, EXPLAIN FOR CONNECTION supports flags. That is, you can run EXPLAIN [EXTENDED|PARTITIONS|FORMAT=JSON] FOR CONNECTION . However, EXPLAIN EXTENDED will not print the warning with query after rewrites, and EXPLAIN FORMAT=JSON will not print attached_condition entries. I think, these are caused by limitations of EXPLAIN FOR CONNECTION code.

There are cases where EXPLAIN FOR CONNECTION will produce outputs like:

MySQL [(none)]> explain for connection 1;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Plan isn’t ready yet |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------+

In some cases, a part of the plan is not ready:

MySQL [(none)]> explain for connection 1;
+----+-------------+----------+------+---------------+------+---------+------+--------+----------------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows   | Extra                |
+----+-------------+----------+------+---------------+------+---------+------+--------+----------------------+
|  1 | PRIMARY     | NULL     | NULL | NULL          | NULL | NULL    | NULL |   NULL | Plan isn’t ready yet |
|  2 | SUBQUERY    | lineitem | ALL  | NULL          | NULL | NULL    | NULL | 974084 | NULL                 |
+----+-------------+----------+------+---------------+------+---------+------+--------+----------------------+

SHOW EXPLAIN in MariaDB could produce similar partial query plans, but after release 10.0.5, there is always a full query plan. It would be interesting to discuss the reasons for this, but the discussion won’t fit into this blog post.

Another interesting question is whether MySQL 5.7’s EXPLAIN FOR CONNECTION allows for printing of query plan into slow query log. Before MariaDB 10.0.5, SHOW EXPLAIN code didn’t allow this. Changes in MariaDB 10.0.5 made printing EXPLAIN at the query end easy, and now MariaDB can save EXPLAINs in the Slow Query Log. Will MySQL 5.7 follow and also add such feature?

Posted in Uncategorized, EXPLAIN, mysql on November 14th, 2013 by spetrunia | | 0 Comments

A chance for Cassandra Storage Engine

Short version: It looks like DataStax has released an official C++ driver for their new CQL native protocol. This makes it possible to update MariaDB’s Cassandra Storage Engine to work with all column families in Cassandra 1.2 and up.

Long version: In case the above didn’t make any sense to you, here’s some background:

  • Cassandra Storage Engine was developed against Cassandra 1.1
  • Cassandra 1.1 had a client-server protocol based on Thrift API. It had a number of limitations, the most important was lack of support for streaming. These translated into limitations in Cassandra storage engine, for example, I could not support secondary indexes.
  • Cassandra 1.2 was released in February, and it had a new, native CQL protocol. The protocol supported streaming, but there were no C/C++ drivers.
  • Besides that, Cassandra has changed its data model in version 1.2. After this change, Thrift API clients were screwed could only access column families defined with WITH COMPACT STORAGE attribute. This applied to Cassandra Storage Engine as well.

Now, with official C/C++ driver for CQL protocol, it should be possible to update Cassandra Storage Engine to work with new versions of Cassandra. It will take some effort to move from using Thrift API to using CQL, but the required amount of effort just went down a lot.

Posted in Uncategorized, mysql on November 7th, 2013 by spetrunia | | 0 Comments

MySQL 5.6: no testcases for non-crashing bugs, either (and an incomplete bugfix)

I normally don’t write such posts, but this time I’ll do a little whining. I was looking at MySQL’s Bug#69581, which was also filed against MariaDB as MDEV-5102. With some effort (little for this particular bug), one can find out that Bug#69581’s counterpart in Oracle’s internal bug database is Bug#16862316, and the fix is in revision 5366 in MySQL 5.6.14.

There are two problems, though:

  • The fix doesn’t include a testcase. We know, Oracle doesn’t publish testcases for crashing bugs, with a justification that it’s for security reasons. However, this bug is not a crashing one. I’m wondering what was the justification for not including testcase for it?
  • Luckily, the bug entry at bugs.mysql.com has the testcase, so I was able to play with it. As a result, I have found that MySQL’s fix is incomplete, and filed another bug, Bug#70703.

I think, the poor bugfix made by Oracle is not a big deal. Although, this is not the first time we at MariaDB see a poor bugfix by Oracle (Can the forks keep up? Well, they would, if they could trust upstream’s patches and just copy them, without having to check whether the fixes are real fixes or band-aids).

The lack of testcases is a much bigger concern. Has Percona Server merged fix for MySQL Bug#69581 correctly? Well, they don’t have any features related to partitioned tables, so most likely, the answer is Yes. But won’t it be nicer if mysql-test test suite had a testcase for this, and they could check it by running tests? It’s not only about forks/branches. Everybody who changes MySQL/MariaDB/Percona source code benefits from mysql-test testsuite having a good test coverage.

Now, are there any volunteers who could run through MySQL 5.6 changelog and find out whether missing testcase for non-crashing bug was an isolated occurrence, or it is a new trend?

Posted in Uncategorized, mysql on October 24th, 2013 by spetrunia | | 0 Comments

MariaDB 10.0: EXPLAIN in the slow query log

MariaDB can now print query’s EXPLAIN into the slow query log! The feature is based on SHOW EXPLAIN technology and its output looks like this:

# Time: 131017 21:26:02
# User@Host: root[root] @ localhost []
# Thread_id: 2  Schema: dbt3sf1  QC_hit: No
# Query_time: 5.535819  Lock_time: 0.000350  Rows_sent: 1  Rows_examined: 65633
## <explain>
##   id select_type     table   type    possible_keys   key     key_len ref     rows    Extra
##   1  SIMPLE  nation  ref     PRIMARY,n_name  n_name  26      const   1       Using where; Using index
##   1  SIMPLE  customer        ref     PRIMARY,i_c_nationkey   i_c_nationkey   5       dbt3sf1.nation.n_nationkey      25124   Using index
##   1  SIMPLE  orders  ref     i_o_custkey     i_o_custkey     5       dbt3sf1.customer.c_custkey      7       Using index
## </explain>
SET timestamp=1382030762;
select count(*) from customer, orders, nation where c_custkey=o_custkey and c_nationkey=n_nationkey and n_name='GERMANY';

By default, EXPLAIN is not printed to the slow query log, one needs to enable it manually like this (either from SQL or from my.cnf):

set log_slow_verbosity='explain‘;

Now, I’d like to find time to hack pt-query-digest to make use of EXPLAINs. For start, let it show the number of different query plans. Then, show query plans on the response time distribution… so that one can tell which ones were fast or slow… The stopping factor here is that pt-query-digest is 500KB of unfamiliar Perl code.

Posted in mysql on October 18th, 2013 by spetrunia | | 2 Comments

EXPLAIN UPDATE/DELETE/INSERT in MySQL and MariaDB

MySQL 5.6 added support for EXPLAIN INSERT/UPDATE/DELETE. MySQL 5.7 made some improvements to EXPLAIN code. Also, 5.7 added support for EXPLAIN FOR CONNECTION, which looks very similar to MariaDB’s SHOW EXPLAIN. I was working on putting EXPLAIN INSERT/UPDATE/DELETE into MariaDB 10.0, so I’ve dug up some details and thought I’d share them:

EXPLAIN UPDATE/DELETE

EXPLAIN UPDATE looks like regular EXPLAIN SELECT:

MySQL [dbt3sf1]> explain update orders set o_orderpriority='1-URGENT' where o_orderDATE='1998-07-21';
+----+-------------+--------+-------+---------------+---------------+---------+-------+------+-------------+
| id | select_type | table  | type  | possible_keys | key           | key_len | ref   | rows | Extra       |
+----+-------------+--------+-------+---------------+---------------+---------+-------+------+-------------+
|  1 | SIMPLE      | orders | range | i_o_orderdate | i_o_orderdate | 4       | const |  628 | Using where |
+----+-------------+--------+-------+---------------+---------------+---------+-------+------+-------------+

EXPLAIN PARTITIONS shows partitions. EXPLAIN EXTENDED shows the “filtered” column, but doesn’t produce a warning with query after rewrites.

In MySQL 5.7, select_type changes from SIMPLE to UPDATE.

MySQL [dbt3sf1]> explain update orders set o_orderpriority='3-MEDIUM' where o_orderDATE='1998-07-21';
+----+-------------+--------+-------+---------------+---------------+---------+-------+------+-------------+
| id | select_type | table  | type  | possible_keys | key           | key_len | ref   | rows | Extra       |
+----+-------------+--------+-------+---------------+---------------+---------+-------+------+-------------+
|  1 | UPDATE      | orders | range | i_o_orderdate | i_o_orderdate | 4       | const |  628 | Using where |
+----+-------------+--------+-------+---------------+---------------+---------+-------+------+-------------+

for multi-table UPDATEs (or DELETEs), select_type=UPDATE (or DELETE) only for tables that are being updated.

Inside, there are some issues. Best-release-ever still suffers from an old MySQL design where execution path taken by EXPLAIN is different from execution path taken by the actual query. If you have something in two places, it goes out of sync in no time, and I have alreay found one example EXPLAIN will show “Using join buffer” where the execution doesn’t use it (filed as BUG#70553). A shameless plug: in MariaDB 10.0, we use one execution path, so bugs like this one are not possible.

Another peculiarity is Using temporary. You can run a query like this:

MySQL [dbt3sf1]> explain update orders set o_orderDATE='1998-07-22' where o_orderDATE='1998-07-21';
+----+-------------+--------+-------+---------------+---------------+---------+-------+------+------------------------------+
| id | select_type | table  | type  | possible_keys | key           | key_len | ref   | rows | Extra                        |
+----+-------------+--------+-------+---------------+---------------+---------+-------+------+------------------------------+
|  1 | UPDATE      | orders | range | i_o_orderdate | i_o_orderdate | 4       | const |  628 | Using where; Using temporary |
+----+-------------+--------+-------+---------------+---------------+---------+-------+------+------------------------------+

and see Using temporary. This looks weird, why would this query need a temporary table? Running the UPDATE and checking Created_tmp_tables shows that indeed no temporary tables were created. The reason for “Using temporary” is that the UPDATE modifies columns in the index that is used to find rows. When that happens, the optimizer needs to collect record ids in a buffer before doing any updates. But, records are collected in a buffer, not in a temporary table. We in MariaDB decided to denote this with Using buffer instead:

MariaDB [dbt3sf1]> explain update orders set o_orderDATE='1998-07-22' where o_orderDATE='1998-07-21';
+------+-------------+--------+-------+---------------+---------------+---------+------+------+---------------------------+
| id   | select_type | table  | type  | possible_keys | key           | key_len | ref  | rows | Extra                     |
+------+-------------+--------+-------+---------------+---------------+---------+------+------+---------------------------+
|    1 | SIMPLE      | orders | range | i_o_orderdate | i_o_orderdate | 4       | NULL |  628 | Using where; Using buffer |
+------+-------------+--------+-------+---------------+---------------+---------+------+------+---------------------------+

EXPLAIN INSERT

This one is weird - INSERT doesn’t have a query plan, why would one need EXPLAIN for it? One could argue that EXPLAIN INSERT ... SELECT makes sense, as the SELECT part has a query plan. However, using that logic, EXPLAIN CREATE TABLE … SELECT should have been supported also, but it isn’t.

In MySQL 5.6, EXPLAIN INSERT seems to be totally useless:

MySQL [dbt3sf1]> explain insert into orders (o_custkey) values ('32423');
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | No tables used |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+

In MySQL 5.7, it shows the table that we insert into:

MySQL [dbt3sf1]> explain insert into orders (o_custkey) values ('32423');
+----+-------------+--------+------+---------------+------+---------+------+------+-------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+------+-------+
|  1 | INSERT      | orders | ALL  | NULL          | NULL | NULL    | NULL | NULL | NULL  |
+----+-------------+--------+------+---------------+------+---------+------+------+-------+

“type=ALL” is misleading, because no full table scan will take place. The only scenario when EXPLAIN INSERT can be useful is when one is doing an INSERT into partitioned table. In that case, EXPLAIN PARTITIONS will show the result of partition pruning. Note that partition pruning for INSERTs works in a rather peculiar way - depending on the number of rows inserted, it may choose not to perform partition pruning and will show that all partitions will be used.

I think this post is getting long, so I will talk about SHOW EXPLAIN and EXPLAIN FOR CONNECTION in a separate post.

Posted in Uncategorized on October 17th, 2013 by spetrunia | | 5 Comments

Webinar recording: MariaDB Query Optimizer - Improving query performance

Last week, we at SkySQL had a webinar titled MariaDB Query Optimizer - Improving Query Performance with yours truly as the author and presenter. The recording is now available online at the above link.

The webinar is an overview of all important parts of the MySQL/MariaDB query optimizer. Special attention is given to the new optimizer features and differences between MySQL and MariaDB.

Posted in Uncategorized, mysql on October 9th, 2013 by spetrunia | | 0 Comments

EXPLAIN FORMAT=JSON vs regular EXPLAIN

Valeriy has mentioned that MySQL manual has no documentation about EXPLAIN FORMAT=JSON feature. There was a bug filed for this since September, 2012. I had spent some time looking at EXPLAIN FORMAT=JSON, so I thought I would share my findings.

The first blog post about EXPLAIN FORMAT=JSON was made by Sheeri. She listed these features of FORMAT=JSON:

  • attached_condition and implicit casts
  • used_key_parts field

In addition to those, I was able to find

  • Pushed index condition information
  • More information about sorting/grouping
  • Subquery attachment points information

Let’s look at these in greater detail:

Pushed index condition information

This is very similar to conditions attached to tables - instead of “Using index condition” one can see the pushed condition itself:

explain format=json select * from Country where Name like 'p%' and Population > 3*1000*1000;
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "Country",
      "access_type": "range",
      "possible_keys": [
        "Name",
        "Population"
      ],
      "key": "Name",
      "used_key_parts": [
        "Name"
      ],
      "key_length": "52",
      "rows": 12,
      "filtered": 100,
      "index_condition": "(`world`.`Country`.`Name` like 'p%')",
       “attached_condition”: “(`world`.`Country`.`Population` > (((3 * 1000) * 1000)))” 
    }
  }

This is very useful. I wish the condition was printed without the over-quoting of every identifier.

More information about sorting/grouping

The old “Using temporary; Using filesort” was replaced with ordering_operation and grouping_operation nodes. The nodes always have using_filesort attribute, and may optionally have using_temporary_table attribute.

explain format=json select * from City order by Population desc limit 1;
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    “ordering_operation”: {
      “using_filesort”: true,
      “table”: {
        “table_name”: “City”,
        “access_type”: “ALL”,
        “rows”: 4249,
        “filtered”: 100
      }
    }
  }
}

This provides slightly more information than the old “Using temporary; Using filesort” line. In the following example, one can see that “Using temporary” is done to handle GROUP BY, while filesort is used for GROUP BY:

explain format=json
select sum(population) as city_ppl, Country
from City
group by Country 
order by city_ppl desc;
*************************** 1. row ***************************
EXPLAIN: {
  “query_block”: {
    “select_id”: 1,
    “ordering_operation”: {
      “using_filesort”: true,
      “grouping_operation”: {
        “using_temporary_table”: true,
        “using_filesort”: false,
        “table”: {
          “table_name”: “City”,
          “access_type”: “ALL”,
          “rows”: 4409,
          “filtered”: 100
        }
      }
    }
  }
}

It is also possible to encounter ordering_operation with using_filesort:false, which means that the optimizer relies on the query plan to produce the data in required ordering.

Subquery attachment points

EXPLAIN FORMAT=JSON shows where the subqueries are attached to queries. In tabular form, you see something like this:

+----+--------------------+-----------+------+---------------+------+---------+------+------+-------------+
| id | select_type        | table     | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+--------------------+-----------+------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY            | Country   | ALL  | Name          | NULL | NULL    | NULL |  165 | Using where |
|  3 | SUBQUERY           | satellite | ALL  | NULL          | NULL | NULL    | NULL | 1041 | NULL        |
|  2 | DEPENDENT SUBQUERY | City      | ALL  | NULL          | NULL | NULL    | NULL | 4409 | Using where |
+----+--------------------+-----------+------+---------------+------+---------+------+------+-------------+

and it is not apparent how subqueries are attached to queries. For basic queries, this is not a problem - just look at the query. However, if the query has VIEWs or FROM clause subqueries, it may be difficult to recognize the query plan structure. In any case, with EXPLAIN FORMAT=JSON, subqueries look like this:

*************************** 1. row ***************************
EXPLAIN: {
  “query_block”: {
    “select_id”: 1,
    “table”: {
      “table_name”: “Country”,
      “access_type”: “range”,
      “possible_keys”: [
        “Name”
      ],
      “key”: “Name”,
      “used_key_parts”: [
        “Name”
      ],
      “key_length”: “52″,
      “rows”: 2,
      “filtered”: 100,
      “index_condition”: “(`world`.`Country`.`Name` in (’France’,'Germany’))”
    },
    “select_list_subqueries”: [
      {
        “dependent”: true,
        “cacheable”: false,
        “query_block”: {
          “select_id”: 2,
          “table”: {
            “table_name”: “City”,
            “access_type”: “ALL”,
            “rows”: 4409,
            “filtered”: 100,
            “attached_condition”: “(`world`.`City`.`Country` = `world`.`Country`.`Code`)”
          }
        }
      }
    ]
  }
}

which makes it more apparent where the subquery is attached and how often it is re-executed.

My personal impressions

EXPLAIN FORMAT=JSON looks like a project that was started with good intents but wasn’t finished. I’ve found a couple of bugs by just playing around (1, 2). A number of things look out of place:

  • Range checked for each record” still displays “index map: 0xNNNN”, leaving one to do binary arithmetic with index numbers to figure out which indexes will be checked
  • We’ve got used_key_parts. Is there any need to print key_length anymore?
  • index_merge is still described with notation like key=”intersect(key1,key2)”, and key_length=”5,5″ (where’s the used_key_parts for it?). This notation was the only way to fit the data into tabular format. With JSON, it would be natural to represent individual index scans as separate JSON objects
  • If a subquery is part of WHERE clause, attached_condition will print the whole subquery. That is, subquery’s WHERE condition is printed both in the parent query and in the subquery. This is redundant and annoying.
  • Items with default value are generally not printed. For example, possible_keys=NULL is not. However, using_filesort:false is printed, and so is filtered:100, which means no filtering happened. Why?

Overall, I don’t quite understand what EXPLAIN FORMAT=JSON was targeted at. If EXPLAIN FORMAT=JSON was targeted at humans, I would expect a more user-friendly output. Don’t print redundant information, don’t do redundant quoting, do a better job at indenting the JSON output.
If EXPLAIN FORMAT=JSON was targeted at machines, I would expect it to be fully documented. How can one expect somebody to write programs to process a data format that has no documentation?

Posted in Uncategorized on July 23rd, 2013 by spetrunia | | 0 Comments

MySQL 5.6, InnoDB statistics, and testcases for optimizer

One of the new features in MySQL 5.6 is persistent table statistics. The word “persistent” makes you think that table statistics will be very stable from now on.
This is generally true, but there are some exceptions. There is a particular case where InnoDB statistics may change with no warning, even when there is no user activity on the server. Consider the following pattern (which nearly all bug or feature testcases follow):

CREATE TABLE t1 (...) ENGINE=INNODB;
INSERT INTO t1 VALUES (1),(2),(3),(4),(5);
EXPLAIN SELECT * FROM t1;
EXPLAIN SELECT * FROM t1;

With MySQL 5.6, the following is possible

  • the first EXPLAIN will show that the optimizer expects table t1 to have 6 rows
  • the second EXPLAIN will show that the optimizer expects table t1 to have 5 rows

For me, this was a big surprise. After all, table t1 never had 6 rows. If there was some rounding which rounded up 5 to 6, why did the number go back down to 5 again?

I debugged it, and discovered that

  1. Multi-row INSERT will increment #records-in-table statistics as many times as many rows it has inserted
  2. There is also dict_stats_thread() in storage/innobase/dict/dict0stats_bg.cc. which re-calculates table statistics in the background.

And what happens is:

  1. INSERT inserts a record
  2. dict_stats_thread() comes and re-calculates the statistics (the new record is counted)
  3. INSERT increments the statistics by one (because it has inserted a record on step#1)

This way, right after INSERT has finished, the table statistics will report more records. The number will go back down within a second (because dict_stats_thread() will re-calculate statistics again). This slight and temporary inaccuracy in statistics should not be a problem for production environments. This is a big deal for test cases in mysql-test suite, though. Test cases are special:

  1. Test datasets are often small. Even an off-by-one difference may cause a different query plan to be chosen. For optimizer bugs, this means that the testcase might be no longer hitting the bug it was made for.
  2. Test cases often include EXPLAIN output in .result file. EXPLAIN output has “rows” column, and mysql-test-run framwork will interpret off-by-one difference as a test failure

I’ll repeat myself: the above two properties mean that with MySQL 5.6, it is not possible to write optimizer+InnoDB testcases the way we used to write them before. I’ve observed it in practice - testcases will fail sporadically, depending on how quickly your system is running.

But wait, what about Oracle? How did they manage to develop MySQL 5.6 with this? I was really interested, too, so I dug in the revision history. Oracle developers’ approach was to run ANALYZE TABLE after filling the table. It took them 33 patches to put ANALYZE TABLE everywhere where it was needed (examples: 1, 2, 3, 4, …) but they did it.

The take-away is: if you’re working on a testcase for an optimizer bug in MySQL 5.6 - put an ANALYZE TABLE statement after all the INSERTs but before the SELECTs. This will make your testcase stable.

For the record, we in MariaDB 10.0 did not take the 33 patches with ANALYZE TABLE statements. For now, we have decided to run mysql-test testsuite with --skip-innodb-stats-persistent option. Judging from the EXPLAIN outputs in the tests, that has achieved the same goal (same query plans), and the goal was achieved with just one patch. :-)

Posted in Uncategorized on July 17th, 2013 by spetrunia | | 0 Comments

Slides from Percona Live talks: optimizer tutorial and Cassandra Storage Engine

I’ve put online the slides for the two talks that I сo-presented at the Percona Live conference:

The tutorial tries to cover most areas of the optimizer, with focus being on optimization of complex parts of SQL like joins. It also shows how recent additions in MySQL and MariaDB (PERFORMANCE_SCHEMA, SHOW EXPLAIN, EXPLAIN FORMAT=JSON, optimizer_trace) can be useful when diagnosing optimizer problems.

The Cassandra talk starts with an introduction into Cassandra and Cassandra Storage Engine, and then proceeds to discuss the intended use cases for Cassandra Storage Engine. This is an important discussion, because there is a variety of reasons one might want to access Cassandra’s data from SQL, and Cassandra Storage Engine addresses only certain kinds of needs.

Posted in Uncategorized on May 1st, 2013 by spetrunia | | 0 Comments

Next Page »