patternsqlModerate
Using TOP and getting different result sets
Viewed 0 times
resulttopgettingdifferentusingandsets
Problem
I am trying to write an efficient query for deleting chunks of data. To this end I hoped to avoid an index scan by using the primary key to get the oldest records. However I'm seeing some unexpected results returned.
I hoped this
Would give me the oldest 15 records because I could rely on the primary key incrementing and therefore the order of storage would be low to high in the table.
However, this returns a different result set
Which seems to be a more accurate but more expensive way of getting the result I need.
Confusingly, this
Gives a different set of OrderID s (PK) to this
I understand that http://msdn.microsoft.com/en-gb/library/ms189463.aspx explains that order cannot be guaranteed without an ORDER BY clause but expected the PK to order for me and can't explain the differences between the final two select clauses.
I hoped this
SELECT TOP 15 OrderID FROM [Order]Would give me the oldest 15 records because I could rely on the primary key incrementing and therefore the order of storage would be low to high in the table.
However, this returns a different result set
SELECT TOP 15 OrderID FROM [Order] ORDER BY DateCreated ASCWhich seems to be a more accurate but more expensive way of getting the result I need.
Confusingly, this
SELECT TOP 15 * FROM [Order]Gives a different set of OrderID s (PK) to this
SELECT TOP 15 OrderID FROM [Order]I understand that http://msdn.microsoft.com/en-gb/library/ms189463.aspx explains that order cannot be guaranteed without an ORDER BY clause but expected the PK to order for me and can't explain the differences between the final two select clauses.
Solution
Take a look at the plans. When you use
Don't "expect" a certain order. If you don't tell SQL Server how to order, then it will use the most efficient way possible, and this can change due to probably more than 20 factors.
If you want a certain order, SAY SO. Please read #3 here:
Also, this post by Michael Swart may be an interesting read:
If you want your second query to be more efficient, you can consider creating an index on
For your actual goal of deleting
In order to minimize the impact on the log, you may want to add some additional logic in there, from my blog post Break large delete operations into chunks. As for the
SELECT * it probably uses the clustered index, and when you only want one column, maybe there is a skinnier index to use.Don't "expect" a certain order. If you don't tell SQL Server how to order, then it will use the most efficient way possible, and this can change due to probably more than 20 factors.
If you want a certain order, SAY SO. Please read #3 here:
- T-SQL Tuesday #56 : SQL Server Assumptions
Also, this post by Michael Swart may be an interesting read:
- Without ORDER BY, You Can’t Depend On the Order of Results
If you want your second query to be more efficient, you can consider creating an index on
DateCreated (you may want to include OrderID - not sure of the present index structure).For your actual goal of deleting
n rows at a time, oldest first, and assuming OrderID is an IDENTITY column (so order create date should roughly align with that), why not use this approach (based on this great blog post, also by Michael Swart):-- pick a datetime for the newest row you want to delete
-- let's say you want to delete all orders before Jan 1 2014:
SELECT @MaxOrderID = MAX(OrderID)
FROM dbo.[Order] -- terrible table name, also always use dbo prefix
WHERE DateCreated 0)
BEGIN
SELECT TOP (@BatchSize) @NextBatchMax = OrderID
FROM dbo.[Order]
WHERE OrderID > @LargestOrderProcessed
AND OrderID @LargestOrderProcessed
AND OrderID <= @NextBatchMax;
SET @RC = @@ROWCOUNT;
SET @LargestOrderProcessed = @NextBatchMax;
ENDIn order to minimize the impact on the log, you may want to add some additional logic in there, from my blog post Break large delete operations into chunks. As for the
dbo prefix, see Bad habits to kick : Avoiding the schema prefix.Code Snippets
-- pick a datetime for the newest row you want to delete
-- let's say you want to delete all orders before Jan 1 2014:
SELECT @MaxOrderID = MAX(OrderID)
FROM dbo.[Order] -- terrible table name, also always use dbo prefix
WHERE DateCreated < '20140101';
DECLARE @BatchSize INT = 1000,
@LargestOrderProcessed INT = -1,
@NextBatchMax INT,
@RC INT = 1;
WHILE (@RC > 0)
BEGIN
SELECT TOP (@BatchSize) @NextBatchMax = OrderID
FROM dbo.[Order]
WHERE OrderID > @LargestOrderProcessed
AND OrderID <= @MaxOrderID
ORDER BY OrderID;
DELETE dbo.[Order]
WHERE OrderID > @LargestOrderProcessed
AND OrderID <= @NextBatchMax;
SET @RC = @@ROWCOUNT;
SET @LargestOrderProcessed = @NextBatchMax;
ENDContext
StackExchange Database Administrators Q#76220, answer score: 11
Revisions (0)
No revisions yet.