patternsqlMinor
Why are [Seemingly] suitable indexes not used on a LEFT JOIN with OR
Viewed 0 times
leftwhyareusedwithindexesjoinseeminglysuitablenot
Problem
I have the following [fairly meaningless, just for the purpose of demonstration] query in the StackOverflow database:
The only index on the
The
The estimated plan for the query is here:
I can see the first thing the optimizer will do is perform a CI scan on the users table to filter only those users where
and retreiving results as such:
Then it will scan the comments CI and for every row, look to see if the row satisfies the predicate
Despite the two indexes, this CI scan is performed.
Wouldn't it be more efficient if the optimizer did a separate seek on each of the indexes in the Comments table above and join them together?
If I visualise what that would look like, in the screenshot above we can see the first result of the Users CI scan is ID 420
I can visualize what the
so if I seek to the rows for user ID 420 as an index seek would:
for every row where
So for the second part of our index seek, we can seek through our index
If I seek
SELECT *
FROM Users u
LEFT JOIN Comments c
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'The only index on the
Users table is a clustered index on ID.The
Comments table has the following Non-Clustered indexes as well as Clustered Index on ID:CREATE INDEX IX_UserID ON Comments
(
UserID,
PostID
)
CREATE INDEX IX_PostID ON Comments
(
PostID,
UserID
)The estimated plan for the query is here:
I can see the first thing the optimizer will do is perform a CI scan on the users table to filter only those users where
DisplayName = Alex, effectively doing this:SELECT *
FROM Users u
WHERE u.DisplayName = 'alex'
ORDER BY Idand retreiving results as such:
Then it will scan the comments CI and for every row, look to see if the row satisfies the predicate
u.Id = c.UserId OR u.Id = c.PostIdDespite the two indexes, this CI scan is performed.
Wouldn't it be more efficient if the optimizer did a separate seek on each of the indexes in the Comments table above and join them together?
If I visualise what that would look like, in the screenshot above we can see the first result of the Users CI scan is ID 420
I can visualize what the
IX_UserID Index looks like using SELECT UserID,
PostID
FROM Comments
ORDER BY UserID,
PostIDso if I seek to the rows for user ID 420 as an index seek would:
for every row where
UserID = 420, I can look if u.Id = c.UserId OR u.Id = c.PostId of, course they all match the u.Id = c.UserId part of our predicate, So for the second part of our index seek, we can seek through our index
IX_PostID which can be visualised as follows:SELECT PostID,
UserID
FROM Comments
ORDER BY PostID,
UserIDIf I seek
Solution
Rather than focusing on how to improve a query like this, which is what the other answers are doing, I'm going to try to answer the question being asked: why doesn't the optimizer produce a plan like the one you've described (that scans the Users table, and then seeks into the two indexes on the Comments table).
Here's your original query again (note I'm using
And the plan:
One attempt to get the plan you want would be to try and force a seek on the
The plans looks like this:
So the answer is that the optimizer is definitely capable of producing such a plan. And it doesn't seem to be a cost-based decision (the seek plan looks much cheaper).
My best guess is that this is just some kind of limitation in the optimizer's exploration process - it doesn't seem to favor converting a left join with an or clause into an apply. This is really unfortunate in this particular case, as performance is dismal in the scan plan (the query takes 45 seconds on my machine) vs the apply plan (less than 1 second).
Side note: You can override the heuristic that disfavors index union plans with undocumented trace flag 8726. See https://dba.stackexchange.com/a/23779 for additional details on that front!
As Rob Farley helpfully pointed out, using
Here's your original query again (note I'm using
MAXDOP 2 just to simulate what I saw in your execution plans):SELECT *
FROM Users u
LEFT JOIN Comments c
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'
OPTION (MAXDOP 2);And the plan:
- Scan of
dbo.Userswith residual predicate to get just the "alex" users
- For each of those users, scan the
dbo.Commentstable and filter matches in the join operator
- Estimated cost: 293.161 optimizer units
One attempt to get the plan you want would be to try and force a seek on the
dbo.Comments table:SELECT *
FROM Users u
LEFT JOIN Comments c WITH (FORCESEEK)
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'
OPTION (MAXDOP 2);The plans looks like this:
- scan of the
dbo.Userstable (with a residual predicate to only get users named "alex"),
- seek into each of the two indexes to get the requested Id values (which are unioned together)
- followed up by a key lookup to get the rest of the columns (since we selected *)
- Estimated cost: 5.98731 optimizer units
So the answer is that the optimizer is definitely capable of producing such a plan. And it doesn't seem to be a cost-based decision (the seek plan looks much cheaper).
My best guess is that this is just some kind of limitation in the optimizer's exploration process - it doesn't seem to favor converting a left join with an or clause into an apply. This is really unfortunate in this particular case, as performance is dismal in the scan plan (the query takes 45 seconds on my machine) vs the apply plan (less than 1 second).
Side note: You can override the heuristic that disfavors index union plans with undocumented trace flag 8726. See https://dba.stackexchange.com/a/23779 for additional details on that front!
As Rob Farley helpfully pointed out, using
APPLY directly (potentially with a UNION as well) is a better approach to get the plan you're looking for - both of those produce the "better" version of this plan (the FORCESEEK version). I would say that "OR in a JOIN" is kind of a known anti-pattern, and should be avoided since it doesn't seem like the optimizer has great support for that type of query directly.Code Snippets
SELECT *
FROM Users u
LEFT JOIN Comments c
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'
OPTION (MAXDOP 2);SELECT *
FROM Users u
LEFT JOIN Comments c WITH (FORCESEEK)
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'
OPTION (MAXDOP 2);Context
StackExchange Database Administrators Q#260880, answer score: 6
Revisions (0)
No revisions yet.