HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

mySQL query optimisation — multiple joins or select … where not in (select distinct…)?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
distinctquerywhereoptimisationmysqlmultipleselectnotjoins

Problem

Background

I have a Drupal install accessing a large users database (~200k rows) and my "People finder" functionality needs to access all those rows (in a random order). I don't seem to be able to use LIMIT and OFFSET from within Drupal's UI (and I have a slightly different Drupal-specific question on Drupal.SE at Slow query with large dataset in Drupal views — better to process in SQL or PHP?, which addresses that issue, as well as parts of this one), but my mySQL-specific question is as follows:

I need to exclude some rows based on data in another table ("include all users in rôle A who are not also in rôles B, C or D). The query Drupal is generating is effectively

SELECT
users.uid AS uid,
/ some columns /,
RAND() AS random_field
FROM
users users
INNER JOIN users_roles users_roles ON users.uid = users_roles.uid
LEFT JOIN users_roles users_roles2
ON users.uid = users_roles2.uid
AND (users_roles2.rid = :views_join_condition_0
OR users_roles2.rid = :views_join_condition_1
OR users_roles2.rid = :views_join_condition_2)
WHERE
(( (users.status <> :db_condition_placeholder_3) -- Active users only
AND (users_roles.rid = :db_condition_placeholder_4) -- Must be in rôle A
AND (users_roles2.rid IS NULL) -- Must not be in rôles B, C, D
AND (users.uid != :users_uid OR users.uid IS NULL) )) -- Must not be current user
ORDER BY random_field ASC


(The reference to users.uid IS NULL is a red-herring; that should never be the case and is not germane to this query.)

Question

Now it strikes me that hand-rolling Drupal's filter criterion (within the constraints of Drupal's UI) might help — and I can hard-code almost all the :db_condition_placeholders — but I'm not sure if there's a significant performance difference between the following two options:

-
Change the FROM clause to

`FROM users INNER JOIN users_roles
ON users.uid = users_roles.uid AND users_ro

Solution

-
I would add an index on users_roles (rid, uid). In a many-to-many table with two columns (a,b), you almost always will need both indexes: (a,b) and (b,a) in one query or the other. I think this index would help in this query.

-
Try various rewritings of the query and the EXPLAIN EXTENDED they produce.

-
About your suggestions, the first is not correct (it will not show same results). For the second suggestion:

WHERE users.status = 1                                           -- Active users only


Yes, that's better than users.status <> 0. This change may have a better effect if there is an index on users (status) (and even more if there are not many active users). Optimizing queries with boolean columns (or ones that act as boolean) is not easy with B-trees.

AND users.uid IN
     (SELECT DISTINCT uid FROM users_roles WHERE rid = 5)        -- Must be in rôle A


No. MySQL is known to have issues with column IN (SELECT ...), especially if the external table is big (and yours is 200K columns, so no, not good).

AND users.uid NOT IN
     (SELECT DISTINCT uid FROM users_roles WHERE rid IN (6,8,9)) -- Not rôles B, C, D


Yes, that is one way to rewrite. The DISTINCT is redundant though.

AND users.uid <> :users_uid                                    -- Not current user


Yes, removing the users.uid IS NOT NULL may help and does not change the result.

  • Other things you could try:



Moving the rid = 5 condition to the ON clause:

INNER JOIN users_roles users_roles 
  ON  users.uid = users_roles.uid
  AND users_roles.rid = 5


The (rewrite) to NOT IN can also be written with NOT EXISTS:

AND NOT EXISTS 
      ( SELECT * 
        FROM users_roles ur 
        WHERE ur.uid = users.uid 
          AND ur.rid IN (6,8,9)
      )

Code Snippets

WHERE users.status = 1                                           -- Active users only
AND users.uid IN
     (SELECT DISTINCT uid FROM users_roles WHERE rid = 5)        -- Must be in rôle A
AND users.uid NOT IN
     (SELECT DISTINCT uid FROM users_roles WHERE rid IN (6,8,9)) -- Not rôles B, C, D
AND users.uid <> :users_uid                                    -- Not current user
INNER JOIN users_roles users_roles 
  ON  users.uid = users_roles.uid
  AND users_roles.rid = 5

Context

StackExchange Database Administrators Q#15794, answer score: 3

Revisions (0)

No revisions yet.