patternsqlModerate
T-SQL - What's the most efficient way to loop through a table until a condition is met
Viewed 0 times
themetconditionwhatuntilsqltableefficientwayloop
Problem
In got a programming task in the area of
Task:
weight.
the column turn.
Question:
What is the most efficient way to solve this problem? If looping is correct is there any room for improvement?
I used a loop and # temp tables, here my solution:
Here the Create script for the table I used Northwind for testing:
```
USE [Northwind]
GO
/ Object: Table [dbo].[line] Script Date: 28.05.2018 21:56:18 /
SET ANSI_NULLS ON
T-SQL.Task:
- People want to get inside an elevator every person has a certain
weight.
- The order of the people waiting in line is determined by
the column turn.
- The elevator has a max capacity of
- Return the last person's name that is able to enter the elevator before it gets too heavy!
- Return type should be table
Question:
What is the most efficient way to solve this problem? If looping is correct is there any room for improvement?
I used a loop and # temp tables, here my solution:
set rowcount 0
-- THE SOURCE TABLE "LINE" HAS THE SAME SCHEMA AS #RESULT AND #TEMP
use Northwind
go
declare @sum int
declare @curr int
set @sum = 0
declare @id int
IF OBJECT_ID('tempdb..#temp','u') IS NOT NULL
DROP TABLE #temp
IF OBJECT_ID('tempdb..#result','u') IS NOT NULL
DROP TABLE #result
create table #result(
id int not null,
[name] varchar(255) not null,
weight int not null,
turn int not null
)
create table #temp(
id int not null,
[name] varchar(255) not null,
weight int not null,
turn int not null
)
INSERT into #temp SELECT * FROM line order by turn
WHILE EXISTS (SELECT 1 FROM #temp)
BEGIN
-- Get the top record
SELECT TOP 1 @curr = r.weight FROM #temp r order by turn
SELECT TOP 1 @id = r.id FROM #temp r order by turn
--print @curr
print @sum
IF(@sum + @curr <= 1000)
BEGIN
print 'entering........ again'
--print @curr
set @sum = @sum + @curr
--print @sum
INSERT INTO #result SELECT * FROM #temp where [id] = @id --id, [name], turn
DELETE FROM #temp WHERE id = @id
END
ELSE
BEGIN
print 'breaaaking.-----'
BREAK
END
END
SELECT TOP 1 [name] FROM #result r order by r.turn descHere the Create script for the table I used Northwind for testing:
```
USE [Northwind]
GO
/ Object: Table [dbo].[line] Script Date: 28.05.2018 21:56:18 /
SET ANSI_NULLS ON
Solution
You should try to avoid loops generally. They are normally less efficient than set based solutions as well as less readable.
The below should be pretty efficient.
Even more so if the name and weight columns could be
It can scan the unique index in order of
As soon as it finds the first row where this exceeds 1000 or is
Execution Plan
In practice it seems to read a few rows ahead of where is strictly necessary - it looks like each window spool/stream aggregate pair causes two additional rows to be read.
For the sample data in the question ideally it would only need to read two rows from the index scan but in reality it reads 6 but this is not a significant efficiency issue and it does not degrade as more rows are added to the table (as in this demo)
For those interested in this issue an image with the rows output by each operator (as shown by the
Click the image below to make it larger or alternatively see the animated version to make the flow easier to follow.
Pausing the animation at the point where the Right hand stream aggregate has emitted its first row (for gary - turn = 1).
It seems apparent that it was waiting to receive its first row with a different WindowCount (for Jo - turn = 2).
And the window spool doesn't release the first "Jo" row until it has read the next row with a different
So the window spool and stream aggregate both cause an additional row to be read and there are four of these in the plan - hence 4 additional rows.
An explanation of the columns shown in the above follows (based on info here)
The below should be pretty efficient.
Even more so if the name and weight columns could be
INCLUDE-d in the index to avoid the key lookups.It can scan the unique index in order of
turn and calculate the running total of the Weight column - then use LEAD with the same ordering criteria to see what the running total in the next row will be.As soon as it finds the first row where this exceeds 1000 or is
NULL (indicating there is no next row) then it can stop the scan.WITH T1
AS (SELECT *,
SUM(Weight) OVER (ORDER BY turn ROWS UNBOUNDED PRECEDING) AS cume_weight
FROM [dbo].[line]),
T2
AS (SELECT LEAD(cume_weight) OVER (ORDER BY turn) AS next_cume_weight,
*
FROM T1)
SELECT TOP 1 name
FROM T2
WHERE next_cume_weight > 1000
OR next_cume_weight IS NULL
ORDER BY turnExecution Plan
In practice it seems to read a few rows ahead of where is strictly necessary - it looks like each window spool/stream aggregate pair causes two additional rows to be read.
For the sample data in the question ideally it would only need to read two rows from the index scan but in reality it reads 6 but this is not a significant efficiency issue and it does not degrade as more rows are added to the table (as in this demo)
For those interested in this issue an image with the rows output by each operator (as shown by the
query_trace_column_values extended event) is below, the rows are output in row_id order (starting at 47 for the first row read by the index scan and finishing at 113 for the TOP)Click the image below to make it larger or alternatively see the animated version to make the flow easier to follow.
Pausing the animation at the point where the Right hand stream aggregate has emitted its first row (for gary - turn = 1).
It seems apparent that it was waiting to receive its first row with a different WindowCount (for Jo - turn = 2).
And the window spool doesn't release the first "Jo" row until it has read the next row with a different
turn (for thomas - turn = 3)So the window spool and stream aggregate both cause an additional row to be read and there are four of these in the plan - hence 4 additional rows.
An explanation of the columns shown in the above follows (based on info here)
- NodeName: Index Scan, NodeId: 15, ColumnName: id base table column covered by index
- NodeName: Index Scan, NodeId: 15, ColumnName: turn base table column covered by index
- NodeName: Clustered Index Seek, NodeId: 17, ColumnName: weight base table column retrieved from lookup
- NodeName: Clustered Index Seek, NodeId: 17, ColumnName: name base table column retrieved from lookup
- NodeName: Segment, NodeId: 13, ColumnName: Segment1010 Returns 1 at start of new group or null otherwise. As no
Partition Byin theSUMonly the first row gets 1
- NodeName: Sequence Project, NodeId: 12, ColumnName: RowNumber1009
row_number()within group indicated by Segment1010 flag. As all rows are in the same group this is ascending integers from 1 to 6. Would be used for filtering right frame rows in cases likerows between 5 preceding and 2 following. (or as forLEADlater)
- NodeName: Segment, NodeId: 11, ColumnName: Segment1011 Returns 1 at start of new group or null otherwise. As no
Partition Byin theSUMonly the first row gets 1 (Same as Segment1010)
- NodeName: Window Spool, NodeId: 10, ColumnName: WindowCount1012 Attribute that groups together rows belonging to a window frame. This window spool is using the "fast track" case for
UNBOUNDED PRECEDING. Where it emits two rows per source row. One with the cumulative values and one with the detail values. Though there is no visible difference in the rows exposed byquery_trace_column_valuesI assume that cumulative columns are there in reality.
- NodeName: Stream Aggregate, NodeId: 9, ColumnName: Expr1004
Count(*)grouped by WindowCount1012 according to plan but actually a running count
- NodeName: Stream Aggregate, NodeId: 9, ColumnName: Expr1005
SUM(weight)grouped by WindowCount1012 according to plan but actually the running sum of weight (i.e.cume_weight)
- NodeName: Segment, NodeId: 7, ColumnName: Expr1002
CASE WHEN [Expr1004]=(0) THEN NULL ELSE [Expr1005] END- Don't see howCOUNT(*)can be 0 so will always be running sum (cume_weight)
- NodeName: Segment, NodeId: 7, ColumnName: Segment1013 No
partition byon theLEADso first row gets 1. All remaining get null
- NodeName: Sequence Project, NodeId: 6, ColumnName: RowNumber1006
row_number()within group indicated by Segment1013 flag. As all rows are in the same group this is ascending integers from 1 to 4
- NodeName: Segment, NodeId: 4, ColumnName: BottomRowNumber1008 RowNumber1006 + 1 as the
LEADrequires the single next row
- NodeName: Segment, NodeId: 4, Co
Code Snippets
WITH T1
AS (SELECT *,
SUM(Weight) OVER (ORDER BY turn ROWS UNBOUNDED PRECEDING) AS cume_weight
FROM [dbo].[line]),
T2
AS (SELECT LEAD(cume_weight) OVER (ORDER BY turn) AS next_cume_weight,
*
FROM T1)
SELECT TOP 1 name
FROM T2
WHERE next_cume_weight > 1000
OR next_cume_weight IS NULL
ORDER BY turnContext
StackExchange Database Administrators Q#208178, answer score: 16
Revisions (0)
No revisions yet.