Pluggable storage engine interface needs to support table name resolution hooks

I’ve started some attempts at coding ha_trace storage engine I’ve mentioned earlier. One of the first things that became apparent was that I needed a way to put a hook into table name resolution code so I can wrap tables into ha_trace objects.

The need to intercept table name resolution and do something other than looking at the .frm files is not unique to call trace engine:

  • Remote storage engines would benefit also:
    • NDB has a whole chunk of code that ships .frm files from one connected mysqld instance to another. It doesn’t hook into name resolution; it ships table definitions proactively, which could be nuisance if you use a new mysqld node to just connect and run a few queries
    • Federated relies on the user to issue CREATE TABLE statement that matches remote table’s DDL. There are no checks that table defintions match, you can do “interesting” things like omitting columns or having index definitions that don’t match the indexes on the remote table. I think this has little or no value, I’d more appreciate an approach where I provide remote server/database/table name and the server figures out the DDL of the remote table itself.
  • Storage engines that have their own data dictionary could relieve MySQL from the need to use .frm files. I remember a discussion about somebody who had lots (tens of thousands iirc) of tables and was running into issues because of the need to open all those .frm files.

I think the above justifies adding name resolution hooks into storage engine interface. I don’t have enough time/cause to implement it, my intent is to do tracing, not name resolution. If you think you’ll also benefit from the name resolution hook, please post a note here or at internals@mysql.com, perhaps we could collect enough voices to get it implemented.

Posted in ideas on December 19th, 2007 by spetrunia | | 0 Comments

Bugfix: DATE vs DATETIME comparisons are now sargable again

We (me coding, Igor and Evgen reviewing) have fixed BUG#32198. According to Support/Bugs team, the bug was causing pain to a number of people, so I thought it deserved a post. The problem was as follows:

  • Before MySQL 5.0.42, comparisons in form
      date_column CMP datetime_const

    were interpreted as comparisons of DATEs (CMP is one of =, <, > , <=, or >=). The time part of datetime_const was ignored.

  • In 5.0.42 we’ve fixed it to perform in a way that is closer to the SQL standard: the comparisons are now resolved by comparing values as DATETIMEs. Our fault was that “date_column CMP datetime_const” stopped being sargable. It turned out there quite a few people who had queries like
      SELECT … WHERE date_column < NOW() … 

    and those queries became awfully slow.

  • In 5.0.54, We’ve fixed BUG#32198 and made “date_column CMP datetime_const” sargable again. Apologies to everyone who was affected.

What keeps me concerned is that both old and new interpretation of DATE/DATETIME comparisons are not compatible with PostgreSQL. PostgreSQL does something strange: comparison of DATE column with DATETIME constant is interpreted as DATETIME comparison, while comparison with NOW() seems to be interpreted as comparison of DATEs:

MySQL PostgreSQL
mysql> select now();
+---------------------+
| now()               |
+---------------------+
| 2007-12-16 18:24:12 |
+---------------------+
1 row in set (0.01 sec)
test=# select now();
              now
-------------------------------
 2007-12-16 18:23:47.197907+03
(1 row)

OK

mysql> select a from tbl where date_col=now();
Empty set (0.02 sec)
test=# select a from tbl where date_col=now();
 a
---
(0 rows)

OK

mysql> select a from tbl
    -> where date_col='2007-12-16 18:24:12';
Empty set (0.00 sec)
test=# select a from tbl
test-# where date_col='2007-12-16 18:24:12';
     a
------------
 2007-12-16
(1 row)

bummer. Why is explicitly specified DATETIME value handled differently from NOW()?

I don’t know what is the logic behind this (or is this a bug in PostgreSQL?) If you do, please let me know.

In my SQL code, I think I’m going to play it safe and always explicitly cast to either DATE or DATETIME.

Posted in bugfixes on December 16th, 2007 by spetrunia | | 0 Comments

Slides and audio: How MySQL Handles ORDER BY, GROUP BY, and DISTINCT

I haven’t posted anything for a while, but that doesn’t mean that nothing was happening around MySQL Optimizer. In fact, exactly the reverse was true - so many thing were happening that I didn’t have time to blog.

I’ll try to make up for the lost coverage. Let’s start from smaller things. On Nov, 1 I’ve had a MySQL University session titled How MySQL Handles ORDER BY, GROUP BY, and DISTINCT.

The session page has slides and audio. If you’re ready to spend 1 hour on MySQL’s ways of doing ordering, you can listen to the audio. Otherwise, you can just scroll through the slides. They are reasonably self-contained and explain MySQL’s ORDER BY and GROUP BY handling in easy-to-digest colorful pictures.

Posted in how-it-works on December 8th, 2007 by spetrunia | | 2 Comments