patternsqlMinor
mySQL query optimisation — multiple joins or select … where not in (select distinct…)?
Viewed 0 times
distinctquerywhereoptimisationmysqlmultipleselectnotjoins
Problem
Background
I have a Drupal install accessing a large users database (~200k rows) and my "People finder" functionality needs to access all those rows (in a random order). I don't seem to be able to use
I need to exclude some rows based on data in another table ("include all users in rôle A who are not also in rôles B, C or D). The query Drupal is generating is effectively
(The reference to
Question
Now it strikes me that hand-rolling Drupal's filter criterion (within the constraints of Drupal's UI) might help — and I can hard-code almost all the
-
Change the
`FROM users INNER JOIN users_roles
ON users.uid = users_roles.uid AND users_ro
I have a Drupal install accessing a large users database (~200k rows) and my "People finder" functionality needs to access all those rows (in a random order). I don't seem to be able to use
LIMIT and OFFSET from within Drupal's UI (and I have a slightly different Drupal-specific question on Drupal.SE at Slow query with large dataset in Drupal views — better to process in SQL or PHP?, which addresses that issue, as well as parts of this one), but my mySQL-specific question is as follows:I need to exclude some rows based on data in another table ("include all users in rôle A who are not also in rôles B, C or D). The query Drupal is generating is effectively
SELECT
users.uid AS uid,
/ some columns /,
RAND() AS random_field
FROM
users users
INNER JOIN users_roles users_roles ON users.uid = users_roles.uid
LEFT JOIN users_roles users_roles2
ON users.uid = users_roles2.uid
AND (users_roles2.rid = :views_join_condition_0
OR users_roles2.rid = :views_join_condition_1
OR users_roles2.rid = :views_join_condition_2)
WHERE
(( (users.status <> :db_condition_placeholder_3) -- Active users only
AND (users_roles.rid = :db_condition_placeholder_4) -- Must be in rôle A
AND (users_roles2.rid IS NULL) -- Must not be in rôles B, C, D
AND (users.uid != :users_uid OR users.uid IS NULL) )) -- Must not be current user
ORDER BY random_field ASC
(The reference to
users.uid IS NULL is a red-herring; that should never be the case and is not germane to this query.)Question
Now it strikes me that hand-rolling Drupal's filter criterion (within the constraints of Drupal's UI) might help — and I can hard-code almost all the
:db_condition_placeholders — but I'm not sure if there's a significant performance difference between the following two options:-
Change the
FROM clause to`FROM users INNER JOIN users_roles
ON users.uid = users_roles.uid AND users_ro
Solution
-
I would add an index on
-
Try various rewritings of the query and the
-
About your suggestions, the first is not correct (it will not show same results). For the second suggestion:
Yes, that's better than
No. MySQL is known to have issues with
Yes, that is one way to rewrite. The
Yes, removing the
Moving the
The (rewrite) to
I would add an index on
users_roles (rid, uid). In a many-to-many table with two columns (a,b), you almost always will need both indexes: (a,b) and (b,a) in one query or the other. I think this index would help in this query.-
Try various rewritings of the query and the
EXPLAIN EXTENDED they produce.-
About your suggestions, the first is not correct (it will not show same results). For the second suggestion:
WHERE users.status = 1 -- Active users onlyYes, that's better than
users.status <> 0. This change may have a better effect if there is an index on users (status) (and even more if there are not many active users). Optimizing queries with boolean columns (or ones that act as boolean) is not easy with B-trees.AND users.uid IN
(SELECT DISTINCT uid FROM users_roles WHERE rid = 5) -- Must be in rôle ANo. MySQL is known to have issues with
column IN (SELECT ...), especially if the external table is big (and yours is 200K columns, so no, not good).AND users.uid NOT IN
(SELECT DISTINCT uid FROM users_roles WHERE rid IN (6,8,9)) -- Not rôles B, C, DYes, that is one way to rewrite. The
DISTINCT is redundant though.AND users.uid <> :users_uid -- Not current userYes, removing the
users.uid IS NOT NULL may help and does not change the result.- Other things you could try:
Moving the
rid = 5 condition to the ON clause:INNER JOIN users_roles users_roles
ON users.uid = users_roles.uid
AND users_roles.rid = 5The (rewrite) to
NOT IN can also be written with NOT EXISTS:AND NOT EXISTS
( SELECT *
FROM users_roles ur
WHERE ur.uid = users.uid
AND ur.rid IN (6,8,9)
)Code Snippets
WHERE users.status = 1 -- Active users onlyAND users.uid IN
(SELECT DISTINCT uid FROM users_roles WHERE rid = 5) -- Must be in rôle AAND users.uid NOT IN
(SELECT DISTINCT uid FROM users_roles WHERE rid IN (6,8,9)) -- Not rôles B, C, DAND users.uid <> :users_uid -- Not current userINNER JOIN users_roles users_roles
ON users.uid = users_roles.uid
AND users_roles.rid = 5Context
StackExchange Database Administrators Q#15794, answer score: 3
Revisions (0)
No revisions yet.