HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Is it better to separate a big query into multiple smaller queries?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
intoqueryseparatebetterbigsmallermultiplequeries

Problem

There are situations which require having really big query joining several tables together with sub select statements in them to produce the desired results.

My question is, should we consider using multiple smaller queries and bring the logical operations into the application layer by querying the DB in more than one calls or it's better to have them all in one go?

For example consider the following query:

SELECT *
FROM   `users`
WHERE  `user_id` IN (SELECT f2.`friend_user_id`
                     FROM   `friends` AS f1
                            INNER JOIN `friends` AS f2
                              ON f1.`friend_user_id` = f2.`user_id`
                     WHERE  f2.`is_page` = 0
                            AND f1.`user_id` = "%1$d"
                            AND f2.`friend_user_id` != "%1$d"
                            AND f2.`friend_user_id` NOT IN (SELECT `friend_user_id`
                                                            FROM   `friends`
                                                            WHERE  `user_id` = "%1$d"))
       AND `user_id` NOT IN (SELECT `user_id`
                             FROM   `friend_requests`
                             WHERE  `friend_user_id` = "%1$d")
       AND `user_image` IS NOT NULL
ORDER  BY RAND() 
LIMIT %2$d


What's the best way of doing it?

Solution

I am going to disagree on large and complicated queries with datagod here. I see these only as problems if they are disorganized. Performance-wise, these are almost always better because the planner has much more freedom in how to go about retrieving the information. However, large queries do need to be written with maintainability in mind. In general, I have found that simple, well-structured SQL to be easy to debug even when a single query goes on for 200+ lines. This is because usually you have a pretty good idea of what kind of problem you are dealing with so there are only a few areas in the query that you have to check.

The maintenance problems, IME, come in when the structure of SQL breaks down. Long, complex queries in subselects impairs readability and troubleshooting, as do inline views, and both of these should be avoided in long queries. Instead, use VIEWs if you can (note if you are on MySQL, views do not perform all that well, but on most other db's they do), and use common table expressions where those don't work (MySQL doesn't support these btw).

Long complex queries work pretty well both from a maintainability and performance case where you keep your where clauses simple, and where you do as much as you can with joins instead of subselects. The goal is to make it so that "records aren't showing up" gives you a few very specific places in the query to check (is it getting dropped in a join or filtered out in a where clause?) and so the maintenance team can actually maintain things.

Regarding scalability, keep in mind that the more flexibility the planner has, that's a good thing too....

Edit: You mention this is MySQL, so views are unlikely to perform that well and CTE's are out of the question. Additionally the example given is not particularly long or complex so that's no problem.

Context

StackExchange Database Administrators Q#35277, answer score: 16

Revisions (0)

No revisions yet.