How MySQL executes ORDER BY

In last couple of weeks there has been a tide of ORDER/GROUP BY-related optimization bugs, where I was the fixer or the reviewer. This wasn’t an easy job because there is no sane description of how GROUP BY/ORDER BY handling is supposed to work.

To figure it out, I had to write an explanation of how it works. The first part is about ORDER BY. Hopefully there will be subsequent parts that will show how GROUP BY is related to ORDER BY and how it works.

Available means to produce ordered sequences

MySQL has two methods to produce ordered streams.

The first is to use a “range“, “ref” or “index” access method over an ordered index. For versions up to 5.1, those access methods naturally return records in the index order, so we get ordering for free (the exception is NDB engine which needs to do merge-sort when it gets data from several storage nodes). In MySQL 5.2, MyISAM and InnoDB have MultiRangeRead optimization which breaks the ordering. We have a draft of how to make it preserve the ordering, but at the moment MRR is simply disabled whenever ordering is required.

The second, catch-all method, is to use the filesort algorithm. In a nutshell, filesort() does quicksort on chunks of data that fit into its memory and then uses mergesort approach to merge the chunks. The amount of memory available to filesort() is controlled by @@sort_buffer_size variable. if the sorted data doesn’t fit into memory (i.e. there is more than one chunk), filesort uses a temporary file to store the chunks.

Source data for filesort() always comes from one table. If there is a need to sort data from several tables, MySQL will first collect the data into a temporary table and then invoke filesort() for that temporary table. I don’t know the true reason for this. Codewise, filesort() wants to pull its source data using something like source.get_next_record() function, while join and union runtime produce their using result.put_next_record()-type calls, so maybe the temporary table is there only to resolve this push/pull mismatch and will go away once we get decent cursors.

filesort() has two modes of operation:

  1. Mode 1: the sorted elements contain all required columns of the source table. The result of the sorting is a linear sequence of output tuples, there is no need to access the source table after the sort is done.
  2. Mode 2: sort <sort_key, rowid> pairs and produce a sequence of rowids which one can use to get source table’s rows in the required order (but this will be essentially hit the table in random order and is not very fast)

Mode 1 is used whenever possible. Mode is used when mode1 is not applicable. This is the case when the sorted tuples have blobs or variable-length columns (TODO: check w/ evgen). Unfortunately, the EXPLAIN output provides no clue about which mode is used, so you’ll have to manually check for blobs in the output columns list.

Executing join to produce ordered stream

At the moment MySQL has three ways to run a join and produce ordered output:

Method EXPLAIN output
Use index-based access method that produces ordered output no mention of filesort
Use filesort() on 1st non-constant table “Using filesort” in the first row
Put join result into a temporary table and use filesort() on it “Using temporary; Using filesort” in the first row

Now I’ll cover those three methods in more detail. The first method can be used when the first non-const table in the join order has an index that matches the ORDER BY list. In this case we can use the index, and the ordering will “spread over” other tables onto the output. Here is a swimlane diagram of this process, where different columns represent different values of the ORDER BY expression:

diagram

This method is preferred over the other two as it requires no additional sorting steps.

The second method can be used when all ORDER BY elements refer to the first table in the join order. In this case, we can filesort() the first table and then proceed to execute the join:

diagram

Here filesort() may operate either in Mode 1 or in Mode 2. One may wonder why this is limited to doing filesort() after the first table. After all, we could do it after the second table as well - produce (tbl1, tbl2) record combinations, put them into temporary table, sort, etc. The expectation is perfectly reasonable but alas, MySQL will not even consider such query plans.

The last, the catch-all method is to write the entire join output into the temporary table and then invoke filesort:

diagram

I have an easier time recalling those three strategies when there are pictures, hopefully they’ll help you too. That’s all for today, in next posts I’ll cover the topics of how ordering affects join optimization and interoperates with GROUP BY.

Posted in how-it-works on August 29th, 2007 by spetrunia | | 12 Comments