principlesqlMinor

Three update query vs single update query performance

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

threeupdatequerysingleperformance

Problem

Am trying to optimize a procedure. There are 3 different update queries present in the procedure.

update #ResultSet
set MajorSector = case 
        when charindex('  ', Sector) > 2 then rtrim(ltrim(substring(Sector, 0, charindex('  ', Sector)))) 
            else ltrim(rtrim(sector)) 
        end

update #ResultSet
set MajorSector = substring(MajorSector, 5, len(MajorSector)-4)
where left(MajorSector,4) in ('(00)','(01)','(02)','(03)','(04)','(05)','(06)','(07)','(08)','(09)')

update #ResultSet
set MajorSector = substring(MajorSector, 4, len(MajorSector)-3)
where left(MajorSector,3) in ('(A)','(B)','(C)','(D)','(E)','(F)','(G)','(H)','(I)','(J)','(K)','(L)','(M)','(N)','(O)','(P)','(Q)','(R)','(S)','(T)','(U)','(V)','(W)','(X)','(Y)','(Z)')

To complete all three update queries it takes less than 10 seconds.

Execution plan for all three update queries.

https://www.brentozar.com/pastetheplan/?id=r11BLfq7b

What I planned is to change the three different update queries into one single update query, so that the I/O can be reduced.

```
;WITH ResultSet
AS (SELECT CASE
WHEN LEFT(temp_MajorSector, 4) IN ( '(00)', '(01)', '(02)', '(03)', '(04)', '(05)', '(06)', '(07)', '(08)', '(09)' )
THEN Substring(temp_MajorSector, 5, Len(temp_MajorSector) - 4)
WHEN LEFT(temp_MajorSector, 3) IN ( '(A)', '(B)', '(C)', '(D)','(E)', '(F)', '(G)', '(H)','(I)', '(J)', '(K)', '(L)','(M)', '(N)', '(O)', '(P)','(Q)', '(R)', '(S)', '(T)','(U)', '(V)', '(W)', '(X)','(Y)', '(Z)' )
THEN Substring(temp_MajorSector, 4, Len(temp_MajorSector) - 3)
ELSE temp_MajorSector
END AS temp_MajorSector,
MajorSector
FROM (SELECT temp_MajorSector = CASE
WHEN Charindex(' ', Sector) > 2 THEN Rtrim(Ltrim(Substring(Sector, 0, Charindex(' ', Sector))))
ELSE Ltrim(Rtri

Solution

The first single update reads and writes every row from the table. The second and third then re-read and re-write a sub-set of those rows. Look at the Actual Number of Rows. When the three statements are combined into one, the optimizer figures that if it has to read everything to satisfy the first change then it can piggy-back off that for the second and third change.

Have a look at the XML version of the query plans, specifically the ` operators and

 parts. In the original plan you'll see each is relatively simple and maps very closely to the SQL. For the all-in-one plan it's a monster. This is the optimizer re-writing the SQL into a logically equivalent form. A plan works1 by passing each row through the operators once. The optimizer's doing all the work it has to do to satisfy all three changes as that row passes through one time.

I'd expect the second query to be faster because the data is only read and written once whereas it is touched three times in the first.

As the second query has no predicates (no WHERE clause) the optimizer has no choice but read every single row and process it. I'm surprised the second form takes longer than the first. Are both starting from clean buffers? Is there other work happening on the system? Since it's reading and writing to a temp table the IO is happening in tempdb. Is there file growth or somesuch happening?

By one measure you have achieved your desired outcome. You say you want to make changes "so that the IO can be reduced." The all-in-one does less IO than the three separate statements do in total. I suspect what you really want, however, is reduced elapsed time, and this is obviously not happening.

1 more or less, lots of details omitted.

I ran your routine to generate test data then ran the three single-update statements and the all-in-one statement. Although there are some differences (no clustered index, no parallelism) I get more-or-less the same results. Specifically, the plans are about the same shape and the three individual queries complete in about two seconds and the one big query in about thirty to thirty five seconds.

I set

set nocount off;
set statistics io on;
set statistics time on;


With the plan in cache and the data in memory I get:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
Table '#ResultSet...'. Scan count 1, logical reads 125223, physical reads 0

 SQL Server Execution Times:
   CPU time = 1422 ms,  elapsed time = 1417 ms.

(242906 row(s) affected)

Table '#ResultSet...'. Scan count 1, logical reads 125223, physical reads 0

 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 337 ms.

(0 row(s) affected)

Table '#ResultSet...'. Scan count 1, logical reads 125223, physical reads 0

 SQL Server Execution Times:
   CPU time = 734 ms,  elapsed time = 747 ms.

(0 row(s) affected)


I've removed some bits that aren't relevant. Since

physical reads is zero for all three the table fits in memory. logical reads

 is the same for all three which makes sense. As there are no indexes the only approach is to scan every row of the table. The second and third query affect zero rows because I'd run them a few times already. CPU time and elapsed work out as 2500ms.

For the bigger query it is

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
Table '#ResultSet...'. Scan count 1, logical reads 125223

 SQL Server Execution Times:
   CPU time = 33093 ms,  elapsed time = 33137 ms.

(242906 row(s) affected)


The same number of pages are read, the same number of rows are updated. The huge differences is the CPU time. This is reflected in casual observation of Task Manager which shows 30% utilisation for the duration of query execution. The question is, why does it take so much?

The individual queries separately have simple calculations and two of the statements have predicates that greatly reduce the number of rows touched. The optimizer has good heuristics for processing these and finds a quick plan. The all-in-one query applies the monster

Compute Scalar` against every single row. My suggestion is that, for whatever reason, the optimizer cannot unravel the logic into a plan that's quick to run and ends up using a lot of CPU. The optimizer has to work with what its given, which in the second case is complex, nested SQL. Perhaps by refactoring the SQL the optimizer will follow different heuristics and achieve a better outcome? Perhaps some (filtered) indexes or (filtered) statistics will convince it to write a different plan. Maybe persisted computed columns would help it along. Perhaps you just need to give the optimizer what it needs and your first attempt really is the best that can be achieved and you need to find a way to run those three in parallel. Sorry I can't be more scientific.

Code Snippets

set nocount off;
set statistics io on;
set statistics time on;

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
Table '#ResultSet...'. Scan count 1, logical reads 125223, physical reads 0

 SQL Server Execution Times:
   CPU time = 1422 ms,  elapsed time = 1417 ms.

(242906 row(s) affected)

Table '#ResultSet...'. Scan count 1, logical reads 125223, physical reads 0

 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 337 ms.

(0 row(s) affected)

Table '#ResultSet...'. Scan count 1, logical reads 125223, physical reads 0

 SQL Server Execution Times:
   CPU time = 734 ms,  elapsed time = 747 ms.

(0 row(s) affected)

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
Table '#ResultSet...'. Scan count 1, logical reads 125223

 SQL Server Execution Times:
   CPU time = 33093 ms,  elapsed time = 33137 ms.

(242906 row(s) affected)

Context

StackExchange Database Administrators Q#177083, answer score: 8

Revisions (0)

No revisions yet.