patternsqlModerate
Running parallel delete statements
Viewed 0 times
statementsrunningparalleldelete
Problem
Problem: I have a massive table - occupies a whopping 9.5 TBs. I need to delete around 5 mil rows from it. The best method I have come up with deletes 1 million records in 2 hours! Since I can not run this during peak time, I get only 2 hour window everyday and deleting 5 mil rows would take me 4 working days at this rate.
Question: I was wondering if SQL Server has a feature where I could run 2 or more deletes on a table simultaneously. Possibly something like virtually partitioning the table - not sure.
I also want to add that running my script induces page locks but keeps the transaction log from growing. My script is very similar to the one suggested by vonPryz:
How to Delete From Large Tables in Batches Without Growing the Transaction Log by Greg M Lucas.
Question: I was wondering if SQL Server has a feature where I could run 2 or more deletes on a table simultaneously. Possibly something like virtually partitioning the table - not sure.
I also want to add that running my script induces page locks but keeps the transaction log from growing. My script is very similar to the one suggested by vonPryz:
How to Delete From Large Tables in Batches Without Growing the Transaction Log by Greg M Lucas.
Solution
Let me try to summarise the answers as per the commenters above.
First of all, to achieve higher deletion speeds you can do three things:
Ad 1) Since deleting a large number of rows at a time hit the transaction log hard and causes locking - you likely want the number of deletes in a statement to be "small". I find that around 10K rows is a good number (and having table lock escalation turned off with
Ad 2) If your table is not already partitioned, this is not an option
Ad 3) DELETE in SQL Server is single threaded, so you need to run more than one at a time to get max speed. To run multiple deletes concurrently, you need a way to partition them so each parallel delete runs on its own set of keys and doesn't block with the others. Typically, you can use the primary key of the table to partition the delete.
For example, if you have an IDENTITY column called
Keep the Max - Min and Min somewhere (a table in tempdb) that you can quickly read from into variables
Let us say you decide that you can allocate 4 cores for running deletion. You now run 4 queries, either from 4 x SSMS new query or four command prompt via SQLCMD, each doing this:
Pick @n = 0,1,2,3 for each of the concurrently running DELETES.
First of all, to achieve higher deletion speeds you can do three things:
- Delete a higher number of rows at a time
- Use partitioning
- Run multiple deletes
Ad 1) Since deleting a large number of rows at a time hit the transaction log hard and causes locking - you likely want the number of deletes in a statement to be "small". I find that around 10K rows is a good number (and having table lock escalation turned off with
ALTER TABLE Foo SET LOCK_ESCALATION = DisabledAd 2) If your table is not already partitioned, this is not an option
Ad 3) DELETE in SQL Server is single threaded, so you need to run more than one at a time to get max speed. To run multiple deletes concurrently, you need a way to partition them so each parallel delete runs on its own set of keys and doesn't block with the others. Typically, you can use the primary key of the table to partition the delete.
For example, if you have an IDENTITY column called
key on the table, you can first:SELECT MAX(key) - MIN(key), MIN(key) FROM Foo WHERE Keep the Max - Min and Min somewhere (a table in tempdb) that you can quickly read from into variables
@IntervalSize, @MinKey respectively. Let us say you decide that you can allocate 4 cores for running deletion. You now run 4 queries, either from 4 x SSMS new query or four command prompt via SQLCMD, each doing this:
DECLARE @NumDone INT = 1
WHILE @NumDone > 0 BEGIN
SET ROWCOUNT 10000
DELETE FROM Foo WHERE Key BETWEEN @MinKey + @IntervalSize / 4 * @n
AND @MinKey + @IntervalSize / 4 (@n + 1)
SET @NumDone = @@ROWCOUNT
ENDPick @n = 0,1,2,3 for each of the concurrently running DELETES.
Code Snippets
SELECT MAX(key) - MIN(key), MIN(key) FROM Foo WHERE <rows that must be deleted>DECLARE @NumDone INT = 1
WHILE @NumDone > 0 BEGIN
SET ROWCOUNT 10000
DELETE FROM Foo WHERE Key BETWEEN @MinKey + @IntervalSize / 4 * @n
AND @MinKey + @IntervalSize / 4 (@n + 1)
SET @NumDone = @@ROWCOUNT
ENDContext
StackExchange Database Administrators Q#56432, answer score: 11
Revisions (0)
No revisions yet.