Observations about subquery use cases

As I wrote earlier, we (me and Ranger) have done an assessment of the impact of new 6.0 subquery optimizations. First, we’ve done a search for performance issues in MySQL’s bug database. The raw list is here, and the here is a summary chart:

Search for customer issues in MySQL’s issue tracker system has produced a similar picture (raw list):

Search in DBT-{1,2,3,4} open source benchmark suites produced a radically different result though (raw data is here):

The samples are small but I think one can already conclude that the benchmarks and user cases have different kinds of subqueries. The cause of the discrepancy is not clear. Are people’s applications different from what the benchmarks simulate? If yes, what is the benchmark that simulates the apps our users are running?

A dumb search for subqueries in random open source applications using grep didn’t produce much. Again, no idea why, either I was looking at the wrong applications, or it could be that applications that use subqueries do not have SQL statements embedded in their code so one can’t find them with grep, or something else.

So far the outcome is that it’s nice that the new optimizations [will] capture a good fraction of real world cases, but there is no benchmark we could run to check or demonstrate this. Well, I’d still prefer this situation over the the opposite.

Posted in subqueries, benchmarks on February 23rd, 2008 by spetrunia | | 7 Comments

Evgen’s CONNECT BY patch is in… PostgreSQL (oops, actually not. more details inside)

Evgen Potemkin is a developer in query optimizer team at MySQL, we’ve been working together for about 3 years now. So, it was a bit unexpected to read in the PostgreSQL news that (originally) his CONNECT BY patch has made it into PostgreSQL. No, we don’t have people coding for two DBMSes at the same time (yet?), the patch has been developed years before Evgen has joined MySQL.

One of the questions raised by the prospect of becoming a part of Sun was whether/how we will handle being a part of company that develops other DBMS as well. I suppose this patch news sets a good start.

It turns out the patch is not in the PostgreSQL tree after all. It is just people at www.postgresql.at (which is NOT just a different name for www.postgresql.org, and that was my error) merged the patch into a 8.3 version of PostgreSQL and made the result available to download).

Sorry for creating confusion, and thanks to those who pointed this out in the comments.

Posted in Uncategorized on February 5th, 2008 by spetrunia | | 2 Comments

Subquery optimizations in MySQL 6.0

I’ve finally found time to put together a couple of wiki pages describing what work we’re doing for subquery optimizations in MySQL 6.0:

  • The Subquery_Works page has an overview of what we’ve done so far, we’re doing now and our future plans.
  • The 6.0 Subquery Optimization Cheatsheet has a short, easy-to-read description of what has been released in MySQL 6.0.3. That should be enough if you just want to get the new version and see if your subquery is/can be faster now or not. If you want to know more details, the nearest occasion is my MySQL University session on February, 28.
  • I’ve done a quick preliminary assessment of the impact of new optimizations. The raw results and first conclusions are here on the 6.0 Subquery Optimization Benchmarks page. We will be doing more analysis and I intend to post more observations based on the already collected data, stay tuned.

That’s everything I have on subqueries so far, but as I’ve already mentioned more is on the way.

Posted in subqueries on February 5th, 2008 by spetrunia | | 2 Comments