<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.12-alpha" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Observations about subquery use cases</title>
	<link>http://s.petrunia.net/blog/?p=35</link>
	<description>Random observations made while working on MySQL / MariaDB query optimizer</description>
	<pubDate>Sat, 04 Sep 2010 18:28:57 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.12-alpha</generator>

	<item>
		<title>by: pixie5</title>
		<link>http://s.petrunia.net/blog/?p=35#comment-40538</link>
		<pubDate>Wed, 07 Jan 2009 23:43:34 +0000</pubDate>
		<guid>http://s.petrunia.net/blog/?p=35#comment-40538</guid>
					<description>Thanks for the link, it's nice to know that these things are in the pipeline!  I did look at VIEWs but in my case they didn't really fit: I think I decided that the volumes would be too large to be efficient.

Although this is slightly off topic, I feel that this is still much work to be done regarding such temporary tables and maybe that's where the problem ultimately lies.  In my opinion such changes like efficient subqueries, indexing temporary tables, joining temporary table to itself etc. should be given priority over "new exotic" functionality.</description>
		<content:encoded><![CDATA[<p>Thanks for the link, it&#8217;s nice to know that these things are in the pipeline!  I did look at VIEWs but in my case they didn&#8217;t really fit: I think I decided that the volumes would be too large to be efficient.</p>
<p>Although this is slightly off topic, I feel that this is still much work to be done regarding such temporary tables and maybe that&#8217;s where the problem ultimately lies.  In my opinion such changes like efficient subqueries, indexing temporary tables, joining temporary table to itself etc. should be given priority over &#8220;new exotic&#8221; functionality.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: sergey</title>
		<link>http://s.petrunia.net/blog/?p=35#comment-40448</link>
		<pubDate>Wed, 07 Jan 2009 10:04:04 +0000</pubDate>
		<guid>http://s.petrunia.net/blog/?p=35#comment-40448</guid>
					<description>&gt; One question I’ve always pondered is when the optimiser puts the results of subquery into temp table for joining to parent query [when looking at EXPLAIN results] why doesn’t it automagically index the join fields?

There's really no good reason, that's a deficiency in the optimizer. At the moment you might get some speedups by using VIEWs instead of FROM subqueries. Certain VIEWs can be "merged" into the upper query. This doesn't give auto-indexing, but at least the optimizer is able to use the indexes that exist in the tables that the VIEW refers to.

We're now trying to address FROM subquery handling, see http://forge.mysql.com/worklog/task.php?id=3485. I can't yet tell which version this will go in, though.</description>
		<content:encoded><![CDATA[<p>> One question I’ve always pondered is when the optimiser puts the results of subquery into temp table for joining to parent query [when looking at EXPLAIN results] why doesn’t it automagically index the join fields?</p>
<p>There&#8217;s really no good reason, that&#8217;s a deficiency in the optimizer. At the moment you might get some speedups by using VIEWs instead of FROM subqueries. Certain VIEWs can be &#8220;merged&#8221; into the upper query. This doesn&#8217;t give auto-indexing, but at least the optimizer is able to use the indexes that exist in the tables that the VIEW refers to.</p>
<p>We&#8217;re now trying to address FROM subquery handling, see <a href="http://forge.mysql.com/worklog/task.php?id=3485." rel="nofollow">http://forge.mysql.com/worklog/task.php?id=3485.</a> I can&#8217;t yet tell which version this will go in, though.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: pixie5</title>
		<link>http://s.petrunia.net/blog/?p=35#comment-40350</link>
		<pubDate>Tue, 06 Jan 2009 13:24:18 +0000</pubDate>
		<guid>http://s.petrunia.net/blog/?p=35#comment-40350</guid>
					<description>The site I've developed involves a lot of the following types of query:

SELECT
	 `s2`.`morestuff`
	,`s3`.`extrastuff`
	,...
FROM (
	SELECT id, morestuff, ...
	FROM stuff2
	INNER JOIN ...
) `s2`
LEFT JOIN (
	SELECT id, extrastuff, ...
	FROM stuff3
	INNER JOIN ...
) `s3`
	ON `s3`.`id` = `s2`.`id`
	AND `s3`.`xxx` = `s2`.`xxx`
WHERE ...

Performance under MySQL is "okay" not great but "okay", MSSQL just eats these for lunch!  Just goes to show how different the two optimisers are!

One question I've always pondered is when the optimiser puts the results of subquery into temp table for joining to parent query [when looking at EXPLAIN results] why doesn't it automagically index the join fields?  Surely indexing and optimising on that cheaper than a lumbering scan?  Also helps people who think in set theory terms which I thought was the point of RDBMS's!

In terms of other subqueries, personally I try to avoid using IN clauses at all, neither MySQL or MSSQL likes them.</description>
		<content:encoded><![CDATA[<p>The site I&#8217;ve developed involves a lot of the following types of query:</p>
<p>SELECT<br />
	 `s2`.`morestuff`<br />
	,`s3`.`extrastuff`<br />
	,&#8230;<br />
FROM (<br />
	SELECT id, morestuff, &#8230;<br />
	FROM stuff2<br />
	INNER JOIN &#8230;<br />
) `s2`<br />
LEFT JOIN (<br />
	SELECT id, extrastuff, &#8230;<br />
	FROM stuff3<br />
	INNER JOIN &#8230;<br />
) `s3`<br />
	ON `s3`.`id` = `s2`.`id`<br />
	AND `s3`.`xxx` = `s2`.`xxx`<br />
WHERE &#8230;</p>
<p>Performance under MySQL is &#8220;okay&#8221; not great but &#8220;okay&#8221;, MSSQL just eats these for lunch!  Just goes to show how different the two optimisers are!</p>
<p>One question I&#8217;ve always pondered is when the optimiser puts the results of subquery into temp table for joining to parent query [when looking at EXPLAIN results] why doesn&#8217;t it automagically index the join fields?  Surely indexing and optimising on that cheaper than a lumbering scan?  Also helps people who think in set theory terms which I thought was the point of RDBMS&#8217;s!</p>
<p>In terms of other subqueries, personally I try to avoid using IN clauses at all, neither MySQL or MSSQL likes them.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: MySQL Optimizer team comments on TPC-H Results &#124; MySQL Performance Blog</title>
		<link>http://s.petrunia.net/blog/?p=35#comment-21291</link>
		<pubDate>Mon, 14 Apr 2008 19:29:51 +0000</pubDate>
		<guid>http://s.petrunia.net/blog/?p=35#comment-21291</guid>
					<description>[...] The reason is MySQL Optimizer Team did not have the goal of optimizing TPC-H queries or queries from any other benchmarks, rather they look at the queries which MySQL Users have problem with and first target the most common issues. I think this is a great approach to pick what needs to be done first. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] The reason is MySQL Optimizer Team did not have the goal of optimizing TPC-H queries or queries from any other benchmarks, rather they look at the queries which MySQL Users have problem with and first target the most common issues. I think this is a great approach to pick what needs to be done first. [&#8230;]
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Sergey Petrunia&#8217;s blog &#187; Ccorrelated semi-join subqueries and PostgreSQL</title>
		<link>http://s.petrunia.net/blog/?p=35#comment-19935</link>
		<pubDate>Mon, 24 Mar 2008 00:57:40 +0000</pubDate>
		<guid>http://s.petrunia.net/blog/?p=35#comment-19935</guid>
					<description>[...] So the first impression is that MySQL kinda wins this round. But wait, if you look at user subquery case analysis here, you'll see that the vast majority of semi-join subqueries are uncorrelated. I wonder if PG folks at some point in the past have made a conscious decision not to bother optimizing correlated subqueries. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] So the first impression is that MySQL kinda wins this round. But wait, if you look at user subquery case analysis here, you&#8217;ll see that the vast majority of semi-join subqueries are uncorrelated. I wonder if PG folks at some point in the past have made a conscious decision not to bother optimizing correlated subqueries. [&#8230;]
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: sergey</title>
		<link>http://s.petrunia.net/blog/?p=35#comment-18337</link>
		<pubDate>Mon, 03 Mar 2008 13:12:41 +0000</pubDate>
		<guid>http://s.petrunia.net/blog/?p=35#comment-18337</guid>
					<description>Olaf,

So yours are WHERE ... IN, too. Good to hear. It seems we're going to have this class of subqueries covered reasonably well in MySQL 6.0</description>
		<content:encoded><![CDATA[<p>Olaf,</p>
<p>So yours are WHERE &#8230; IN, too. Good to hear. It seems we&#8217;re going to have this class of subqueries covered reasonably well in MySQL 6.0
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Olaf van der Spek</title>
		<link>http://s.petrunia.net/blog/?p=35#comment-17658</link>
		<pubDate>Sun, 24 Feb 2008 19:42:21 +0000</pubDate>
		<guid>http://s.petrunia.net/blog/?p=35#comment-17658</guid>
					<description>&#62; A dumb search for subqueries in random open source applications using grep didn't produce much. Again, no idea why,

I tried to use subqueries (like select * from T where F in (select ...) but those queries were way too slow to be usable, so I just forgot about subqueries.</description>
		<content:encoded><![CDATA[<p>&gt; A dumb search for subqueries in random open source applications using grep didn&#8217;t produce much. Again, no idea why,</p>
<p>I tried to use subqueries (like select * from T where F in (select &#8230;) but those queries were way too slow to be usable, so I just forgot about subqueries.
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
