Performance-wise, the idea of Prepared Statements is that the server does certain pre-processing on PREPARE command, and then those steps are omitted when the statement is executed. If the statement is executed several times, we get an economy of
This makes one want to move the CPU and IO-intensive query optimization into the PREPARE phase. Unfortunately, this will make the optimizer work much worse - optimizer’s decisions are based on the external information, and there is much less available information at PREPARE phase. The most crucial differences are that
- The values of the ‘?’ parameter markers are not yet known
- The results of probes done in the the queried tables cannot be relied on because the table data may change before the EXECUTE
- [less important] Table and index statistics may change before the EXECUTE
Those limitations cause the most of MySQL’s optimizations to be re-done at every EXECUTE command. To be precise, here is a table of what is done when:
|Nested JOIN simplification||First EXECUTE|
|OUTER->INNER JOIN conversions||First EXECUTE|
|Partition pruning||Every EXECUTE|
|COUNT/MIN/MAX elimination||Every EXECUTE|
|Constant subexpression removal||Every EXECUTE|
|Equality propagation||Every EXECUTE|
|Constant table detection||Every EXECUTE|
|ref access analysis||Every EXECUTE|
|range/index_merge analysis and optimization||Every EXECUTE|
|Join optimization||Every EXECUTE|
Basically, the current rule is as follows:
Optimizations that depend only on query syntax are performed either at PREPARE or at first EXECUTE. Optimizations that also depend on something else are performed at every EXECUTE.
If you’re only interested in the current state of affairs, that’s all you need to know. If you want to know what [I think] is going happen in the future, read on.
The above quoted rule is not a DBMS law of nature. In fact, it is already getting in the way of subquery rewrites (see e.g. BUG#27430, WL#3740) and some other server subsystems. We’ve recently had a discussion about other possible solutions. The smartest and most elegant solution was suggested by Timour:
- Let every optimization record its “dependencies” - information that it depends on:
- ref access analyzer: set of table indexes
- constant table detector: the above + the set of tables that have 1 or 0 rows
- range optimizer: all of the above + used values of parameter markers(if any) + results of records_in_range calls
- … and so forth…
- On subsequent EXECUTE, check which dependencies have changed and re-do the correponding optimizations.
This approach will reduce the work done on EXECUTE step to doing a set of checks. In most cases, the checks will pass, and no re-optimization will be required. This approach is hard to implement, however. MySQL’s internal data structures do not naturally support “block undo”, so it is not easy to cancel and re-do an optimization.
For now we’ve settled for a less impressive solution:
- We’ll gradually move query rewrites that depend on table DDLs into PREPARE phase
- On EXECUTE we’ll check if DDLs of the used tables have changed. If they have, we’ll throw out the Prepared Statement and re-prepare it.
Re-preparing the statement is expensive but it should be a very rare occurence for most applications. This task is filed as WL#3813: Re-prepare the prepared statement if its underlying objects have changed.
The idea how to speed up nested-loops join a little I’ve mentioned earlier has now been promoted into a worklog entry. It is now known as WL#3724 “Short-Cutting Join Execution: Speeding up star queries” and its text is publicly available at MySQLForge.
At the moment there is only a short description, but hopefully Martin Hansson (the assigned developer) will add more content there.