HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

Optimization: One time vs Staging

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
stagingoptimizationonetime

Problem

I ran into a situation with two options:

-
Use a single query with a cost of ~0.07

INSERT INTO @vals
SELECT ...
  FROM ...
  JOIN ... 
  JOIN ...
 WHERE a.col = 1 OR b.col IS NOT NULL


-
Use two queries to get the same result -- the first cost ~0.05, the second cost ~0.3

INSERT INTO @temp
SELECT ...
  FROM ...
  JOIN ...

INSERT INTO @vals
SELECT ...
  FROM @temp
  JOIN ...
 WHERE a.col = 1
UNION ALL
SELECT ...
  FROM @temp
  JOIN ...
  JOIN ...
 WHERE b.col IS NOT NULL


I chose the first option - single query, so less concern of mutating data between queries and less overall cost. Was this the prudent choice?

This was for an operation that originally took ~5+ minutes, timing out in our application. The re-write got that down to 1.5 minutes consistently, using table variables with unique clustered indexes.

Solution

The difference between one and two queries may not be as large as you'd think. For example, compare this query:

declare @t table (id int)
insert into @t select id from Table1 where col1 = 7
update Table1 set col2 = 8 where id in (select id from @t)


to this query:

update Table1 set col2 = 8 where id in (select id from Table1 where col1 = 7)


For the second query, SQL Server could run the subquery first, and store the result in a temporary table. If it did that, that would be the EXACT SAME way it executes the first query!

So the effect of splitting a query in two is that it limits the query optimizer's choice. The first query demands that the subquery is executed before the updates. The second query leaves the optimizer free to fetch one row, then update the table, and then fetch the next row.

Now when you upgrade the hardware, SQL Server version, or if the database changes, you would ideally re-verify that your limitation is in fact beneficial. That's kind of expensive.

So I would prefer not split the query, or enforce a query plan, unless the expense is offset by a considerable advantage. Going from 0.07 to 0.3 seconds doesn't look like a big difference to me, unless the query is run more than once every minute or so.

Code Snippets

declare @t table (id int)
insert into @t select id from Table1 where col1 = 7
update Table1 set col2 = 8 where id in (select id from @t)
update Table1 set col2 = 8 where id in (select id from Table1 where col1 = 7)

Context

StackExchange Database Administrators Q#16192, answer score: 3

Revisions (0)

No revisions yet.