No, Sheeri, MySQL 5.6 does not optimize subqueries away

Sheeri wrote a blog post that claims that “IN Subqueries in MySQL 5.6 Are Optimized Away” and uses that as a basis to conclude that subquery optimizations in MySQL 5.6 are superior to MariaDB’s.
The claim is incorrect (and so is the conclusion). As a person who has coded both of the mentioned FirstMatch and semi-join materialization, I think I need to write about this.

Sheeri wrote:

  1. “So MariaDB recognizes the subquery and optimizes it. But it is still optimized as a subquery”
  2. “In MySQL 5.6, the subquery is actually optimized away”

The first statement is somewhat true. The second one is not. I’ll try to explain. The example subquery Sheeri used was:

SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)

Its meaning is “find films that have actors in table film_actor”. It is not possible to “optimize away” the subquery here. Not more than it’s possible to take the expression “2*3″ and optimize away the “*3″ part of it. The subquery affects the result of the whole query. You can’t remove it.

What the optimizer (both in MariaDB and in MySQL 5.6) does here is to convert the subquery into a semi-join. Semi-join is basically a “subquery in the WHERE clause” (read the link for more details), and it gives the optimizer more possible choices.
Semi-join can be seen in EXPLAIN EXTENDED. In both MariaDB and MySQL, one can see:

... select `film`.`title` AS `title`
    from `sakila`.`film` semi join (`sakila`.`film_actor`) ...

But what about different query plans? They do not show superiority of one optimizer over the other. As indicated in documentation, MariaDB supports the FirstMatch strategy that was the chosen by MySQL 5.6. Also, MySQL 5.6 supports semi-join materialization strategy that was picked by MariaDB. I suspect, different query plans were chosen because MariaDB and MySQL use different cost formulas. It is not possible to tell whose formulas are better, because both query plans finish in 0.01 seconds, and the tables are very small.

Which means, this example doesn’t allow one to conclude that MySQL’s subquery optimizations are superior to MariaDB (or vice versa) QED.

Addendum. Query plan used by MySQL will read *exactly* the same rows from the tables that MySQL 5.1/5.5 (which have no subquery optimizations) would read. The best you can deduce here is that MySQL 5.6’s subquery optimizations are not making things worse.

Posted in Uncategorized on January 30th, 2013 by spetrunia | | 6 Comments