HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

MyISAM Performance: Join Decomposition

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
joinperformancemyisamdecomposition

Problem

in High Performance MySQL on page 159 they talk about breaking up complex queries into simple ones:

Converting

SELECT * FROM tag
JOIN tag_post ON tag_post.tag_id=tag.id
JOIN post ON tag_post.post_id=post.id
WHERE tag.tag='mysql';


To

SELECT * FROM tag WHERE tag='mysql';
SELECT * FROM tag_post WHERE tag_id=1234;
SELECT * FROM post WHERE post.id in (123,456,567,9098,8904);


And sort of doing the actual join yourself in your application.

My Question is whether this is stil such a good idea when the final query has a where-clause with a few thousand IDs it needs to match (the actual table itself has about 500k entries).

What I mean is, will there be a big penalty for having a query like

SELECT * FROM post WHERE post.id in (123,456,567, ...  ... ,9098,8904);


instead of the join-statement above? Would it help to move this logic to Stored Procedures inside the Database (while considering how poorly stored procedures are implemented in MySQL)?

Solution

breaking up complex queries into simple ones

Poppycock. Why do extra effort when MySQL is quite willing to do it for you? As for performance -- there is probably no difference except that the broken up queries require more round trips to the server.

OTOH, there are cases where you can outsmart the optimizer. But your example was not one of those.

IN (thousands-of-ids) is possible, but painful, for the server. It will sort and de-dup them, then leave them in some kind of structure for repeated binary searching. I have seen lots of such queries, but only those over, say, 50K items raised any eyebrows.

There are times when this rewrite helps:

SELECT ... ORDER BY ... LIMIT ...


-->

SELECT b... 
FROM tbl b 
   JOIN ( SELECT id FROM TBL WHERE ... ORDER BY ... LIMIT ... ) a 
   ON a.id = b.id


But that is to avoid hauling around extra junk that will be thrown away by the LIMIT.

Code Snippets

SELECT ... ORDER BY ... LIMIT ...
SELECT b... 
FROM tbl b 
   JOIN ( SELECT id FROM TBL WHERE ... ORDER BY ... LIMIT ... ) a 
   ON a.id = b.id

Context

StackExchange Database Administrators Q#1834, answer score: 3

Revisions (0)

No revisions yet.