principlesqlModerate
Join condition vs. WHERE...does it make a performance difference?
Viewed 0 times
conditionwheremakejoindifferenceperformancedoes
Problem
I've developed a habit of separating join conditions from other additional conditions. However I understand the logical execution order is:
Can it be detrimental to performance if I add additional conditions in the
Here are two simple sample queries which both return the same plan:
I'd like to add that if I write longer analytical statements which typically span more than 100 lines (formatted) I try to reduce the result set as soon as possible typically using derived tables and adding an additional where there before it gets to the join.
- FROM
- WHERE
Can it be detrimental to performance if I add additional conditions in the
where clause rather than the join clause; or is this a part which typically gets simplified and handled equally in the phase of query optimization?Here are two simple sample queries which both return the same plan:
USE StackOverflow2010;
-- additional filters in where clause
SELECT TOP 500 p.id
FROM dbo.Posts p
INNER JOIN dbo.Votes v ON p.id = v.PostId
WHERE
v.VoteTypeId = 2
ORDER BY p.id
;
-- all criteria in on clause
SELECT TOP 500 p.id
FROM dbo.Posts p
INNER JOIN dbo.Votes v ON p.id = v.PostId AND v.VoteTypeId = 2
ORDER BY p.id
;I'd like to add that if I write longer analytical statements which typically span more than 100 lines (formatted) I try to reduce the result set as soon as possible typically using derived tables and adding an additional where there before it gets to the join.
Solution
Pretty much any "does it matter if I write something x way or y way" question can only be answered "maybe".
Optimizers have to consider a lot of potential join paths and query plans and they don't have a lot of time to produce a plan. That means they end up using a lot of heuristics to prune potential plan trees early rather than fully considering every possible query plan. As a result, it is possible that small changes to the text of the query cause some change in the way the optimizer goes about pruning that plan tree and ultimately produces a different plan. Whether it is likely depends a lot on your queries-- systems that join together a small number of tables where statistics tell the optimizer that there is an obvious driving table and that should obviously do a nested loop join to the next table will probably behave differently than systems that join together dozens of large tables where there may not be an obvious driving table and the optimizer thinks many different join paths are plausible.
Ultimately, the only way to answer the question is to go query-by-query, write it both ways, and look to see if the plan changes. If the plan is identical, it doesn't matter. If the plan changes, then it matters. At least for that query. On that version of SQL Server. With the current set of runtime statistics and settings. Change any of those things and you'd have to reassess.
Practically, this means that you should write the query in a way that most clearly expresses your intention and let the optimizer do its job. When you profile the application, you can then look at the queries that are consuming the most time and consider whether there is a more efficient way to write those specific queries. The optimizer is going to do the right thing in the vast majority of cases. It's the minority of cases where the optimizer does something inefficient that you need to spend developer time thinking about.
Optimizers have to consider a lot of potential join paths and query plans and they don't have a lot of time to produce a plan. That means they end up using a lot of heuristics to prune potential plan trees early rather than fully considering every possible query plan. As a result, it is possible that small changes to the text of the query cause some change in the way the optimizer goes about pruning that plan tree and ultimately produces a different plan. Whether it is likely depends a lot on your queries-- systems that join together a small number of tables where statistics tell the optimizer that there is an obvious driving table and that should obviously do a nested loop join to the next table will probably behave differently than systems that join together dozens of large tables where there may not be an obvious driving table and the optimizer thinks many different join paths are plausible.
Ultimately, the only way to answer the question is to go query-by-query, write it both ways, and look to see if the plan changes. If the plan is identical, it doesn't matter. If the plan changes, then it matters. At least for that query. On that version of SQL Server. With the current set of runtime statistics and settings. Change any of those things and you'd have to reassess.
Practically, this means that you should write the query in a way that most clearly expresses your intention and let the optimizer do its job. When you profile the application, you can then look at the queries that are consuming the most time and consider whether there is a more efficient way to write those specific queries. The optimizer is going to do the right thing in the vast majority of cases. It's the minority of cases where the optimizer does something inefficient that you need to spend developer time thinking about.
Context
StackExchange Database Administrators Q#301051, answer score: 13
Revisions (0)
No revisions yet.