principlesqlMinor
Optimization: One time vs Staging
Viewed 0 times
stagingoptimizationonetime
Problem
I ran into a situation with two options:
-
Use a single query with a cost of ~0.07
-
Use two queries to get the same result -- the first cost ~0.05, the second cost ~0.3
I chose the first option - single query, so less concern of mutating data between queries and less overall cost. Was this the prudent choice?
This was for an operation that originally took ~5+ minutes, timing out in our application. The re-write got that down to 1.5 minutes consistently, using table variables with unique clustered indexes.
-
Use a single query with a cost of ~0.07
INSERT INTO @vals
SELECT ...
FROM ...
JOIN ...
JOIN ...
WHERE a.col = 1 OR b.col IS NOT NULL-
Use two queries to get the same result -- the first cost ~0.05, the second cost ~0.3
INSERT INTO @temp
SELECT ...
FROM ...
JOIN ...
INSERT INTO @vals
SELECT ...
FROM @temp
JOIN ...
WHERE a.col = 1
UNION ALL
SELECT ...
FROM @temp
JOIN ...
JOIN ...
WHERE b.col IS NOT NULLI chose the first option - single query, so less concern of mutating data between queries and less overall cost. Was this the prudent choice?
This was for an operation that originally took ~5+ minutes, timing out in our application. The re-write got that down to 1.5 minutes consistently, using table variables with unique clustered indexes.
Solution
The difference between one and two queries may not be as large as you'd think. For example, compare this query:
to this query:
For the second query, SQL Server could run the subquery first, and store the result in a temporary table. If it did that, that would be the EXACT SAME way it executes the first query!
So the effect of splitting a query in two is that it limits the query optimizer's choice. The first query demands that the subquery is executed before the updates. The second query leaves the optimizer free to fetch one row, then update the table, and then fetch the next row.
Now when you upgrade the hardware, SQL Server version, or if the database changes, you would ideally re-verify that your limitation is in fact beneficial. That's kind of expensive.
So I would prefer not split the query, or enforce a query plan, unless the expense is offset by a considerable advantage. Going from 0.07 to 0.3 seconds doesn't look like a big difference to me, unless the query is run more than once every minute or so.
declare @t table (id int)
insert into @t select id from Table1 where col1 = 7
update Table1 set col2 = 8 where id in (select id from @t)to this query:
update Table1 set col2 = 8 where id in (select id from Table1 where col1 = 7)For the second query, SQL Server could run the subquery first, and store the result in a temporary table. If it did that, that would be the EXACT SAME way it executes the first query!
So the effect of splitting a query in two is that it limits the query optimizer's choice. The first query demands that the subquery is executed before the updates. The second query leaves the optimizer free to fetch one row, then update the table, and then fetch the next row.
Now when you upgrade the hardware, SQL Server version, or if the database changes, you would ideally re-verify that your limitation is in fact beneficial. That's kind of expensive.
So I would prefer not split the query, or enforce a query plan, unless the expense is offset by a considerable advantage. Going from 0.07 to 0.3 seconds doesn't look like a big difference to me, unless the query is run more than once every minute or so.
Code Snippets
declare @t table (id int)
insert into @t select id from Table1 where col1 = 7
update Table1 set col2 = 8 where id in (select id from @t)update Table1 set col2 = 8 where id in (select id from Table1 where col1 = 7)Context
StackExchange Database Administrators Q#16192, answer score: 3
Revisions (0)
No revisions yet.