patternsqlMinor
tuning huge delete operation on sql server table
Viewed 0 times
deletesqltuningoperationhugeservertable
Problem
I am performing a delete operation on very large sql server table based on query as discussed below.
stg_table and stg_table_1 doesn't have indexes on service_date.
both of these tables are loaded with million rows of data and delete operation is taking lot of time. Requesting your suggestion to improve the performance of this query.
I referred to strategy described in below question but couldn't understand how to implement it.
How to delete large amount of data in sql server without data loss?
requesting your kind suggestion on this.
Update:
how about if go with above logic to delete the millions of records. any advantages and disadvantages with that.
delete db.st_table_1
where service_date between(select min(service_date) from stg_table)
and (select max(service_date) from stg_table);stg_table and stg_table_1 doesn't have indexes on service_date.
both of these tables are loaded with million rows of data and delete operation is taking lot of time. Requesting your suggestion to improve the performance of this query.
I referred to strategy described in below question but couldn't understand how to implement it.
How to delete large amount of data in sql server without data loss?
requesting your kind suggestion on this.
Update:
select * into db.temp_stg_table_1
from db.stg_table_1
where service_date not between( select min(service_date) from db.stg_table)
and (select max(service_date) from db.stg_table);
exec sp_rename 'stg_table_1' , 'stg_table_1_old'
exec sp_rename 'temp_stg_table_1' , 'test_table_1'
drop table stg_table_1_oldhow about if go with above logic to delete the millions of records. any advantages and disadvantages with that.
Solution
Testing based on your comments
Tested on SQL Server 2014 SP3
stg_table and stg_table_1 doesn't have indexes on service_date.
both of these tables are loaded with million rows of data and delete
operation is taking lot of time.
DDL
PK's + Clustered indexes on identity fields.
DML
2.5M rows in
(Almost) all of these 2.5M rows will be deleted by the query Which is a more than 10 times less than yours.
Running your query
The actual execution plan for your base delete statement
As expected
The cpu time & elapsed / execution time:
A missing index hint is added to the execution plan:
However, when we add the index, an extra sort appears to delete the rows from this newly added index:
The plan
And the cpu time / elapsed time increases:
YMMV, but from my example, based on your comments about data, it did not improve the query.
Creating an index on
As a result the
With the execution time improved:
And the execution plan
But this is only based on indexing & my own example.
Proceed at your own risk.
Extra Notes
You should look into splitting that delete into separate batches so it won't fill up the log file & not having one big block of failed / succeeded delete .
You could also consider using
Update:
how about if go with above logic to delete the millions of records.
any advantages and disadvantages with that.
Apart from performance,
When I run the query on my own data:
This does not represent your data, keep that in mind.
It is reading all rows to return 0 which is not optimal.
With a high execution time:
But this is not really meaningful without more information regarding your data.
A query plan would be needed to give out more correct advice.
Tested on SQL Server 2014 SP3
stg_table and stg_table_1 doesn't have indexes on service_date.
both of these tables are loaded with million rows of data and delete
operation is taking lot of time.
DDL
CREATE TABLE dbo.st_table_1( stg_table_1_ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
service_date datetime2,
val int)
CREATE TABLE dbo.stg_table (stg_table_ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
service_date datetime2,
val int)PK's + Clustered indexes on identity fields.
DML
INSERT INTO dbo.stg_table WITH(TABLOCK)
(
service_date,val)
SELECT -- 1M
DATEADD(S,rownum,GETDATE()),rownum
FROM
(SELECT TOP(1000000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rownum
FROM master.dbo.spt_values spt1
CROSS APPLY master.dbo.spt_values spt2) as sptvalues
INSERT INTO dbo.st_table_1 WITH(TABLOCK)
(
service_date,val)
SELECT -- 2.5M
DATEADD(S,rownum,GETDATE()),rownum
FROM
(SELECT TOP(2500000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rownum
FROM master.dbo.spt_values spt1
CROSS APPLY master.dbo.spt_values spt2) as sptvalues
INSERT INTO dbo.stg_table WITH(TABLOCK)
(
service_date,val)
SELECT -- 4M
DATEADD(S,rownum,GETDATE()),rownum
FROM
(SELECT TOP(4000000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rownum
FROM master.dbo.spt_values spt1
CROSS APPLY master.dbo.spt_values spt2) as sptvalues2.5M rows in
dbo.st_table_1 and 5M rows in dbo.stg_table(Almost) all of these 2.5M rows will be deleted by the query Which is a more than 10 times less than yours.
Running your query
The actual execution plan for your base delete statement
As expected
dbo.stg_table is accessed twice to get the max & min values with a stream aggregate.The cpu time & elapsed / execution time:
CPU time = 4906 ms, elapsed time = 4919 ms.A missing index hint is added to the execution plan:
CREATE NONCLUSTERED INDEX []
ON [dbo].[st_table_1] ([service_date])
INCLUDE ([stg_table_1_ID])However, when we add the index, an extra sort appears to delete the rows from this newly added index:
The plan
And the cpu time / elapsed time increases:
CPU time = 11156 ms, elapsed time = 11332 ms.YMMV, but from my example, based on your comments about data, it did not improve the query.
Creating an index on
[dbo].[stg_table] CREATE NONCLUSTERED INDEX IX_service_date
ON [dbo].[stg_table] ([service_date]);As a result the
MAX() and MIN() can leverage the newly created index to return only one row instead of a full clustered index scan:With the execution time improved:
SQL Server Execution Times:
CPU time = 2609 ms, elapsed time = 4028 ms.And the execution plan
But this is only based on indexing & my own example.
Proceed at your own risk.
Extra Notes
You should look into splitting that delete into separate batches so it won't fill up the log file & not having one big block of failed / succeeded delete .
You could also consider using
(TABLOCK) so the entire table is locked from the very beginning.SET STATISTICS IO, TIME ON;
delete dbo.st_table_1 WITH(TABLOCK)
where service_date between(select min(service_date) from stg_table)
and (select max(service_date) from stg_table);Update:
SELECT INTO + sp_renameselect * into db.temp_stg_table_1
from db.stg_table_1
where service_date not between( select min(service_date) from db.stg_table)
and (select max(service_date) from db.stg_table);
exec sp_rename 'stg_table_1' , 'stg_table_1_old'
exec sp_rename 'temp_stg_table_1' , 'test_table_1'
drop table stg_table_1_oldhow about if go with above logic to delete the millions of records.
any advantages and disadvantages with that.
Apart from performance,
sp_rename needs a Sch-M lock to complete, meaning that it has to wait for all other sessions to release their locks on the table before it can be modified. Any indexes/constraints on the original table will be gone and you will have to recreate them. When I run the query on my own data:
select * into dbo.temp_stg_table_1
from dbo.st_table_1
where service_date not between( select min(service_date) from dbo.stg_table)
and (select max(service_date) from dbo.stg_table);This does not represent your data, keep that in mind.
It is reading all rows to return 0 which is not optimal.
With a high execution time:
SQL Server Execution Times:
CPU time = 27717 ms, elapsed time = 10657 ms.But this is not really meaningful without more information regarding your data.
A query plan would be needed to give out more correct advice.
Code Snippets
CREATE TABLE dbo.st_table_1( stg_table_1_ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
service_date datetime2,
val int)
CREATE TABLE dbo.stg_table (stg_table_ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
service_date datetime2,
val int)INSERT INTO dbo.stg_table WITH(TABLOCK)
(
service_date,val)
SELECT -- 1M
DATEADD(S,rownum,GETDATE()),rownum
FROM
(SELECT TOP(1000000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rownum
FROM master.dbo.spt_values spt1
CROSS APPLY master.dbo.spt_values spt2) as sptvalues
INSERT INTO dbo.st_table_1 WITH(TABLOCK)
(
service_date,val)
SELECT -- 2.5M
DATEADD(S,rownum,GETDATE()),rownum
FROM
(SELECT TOP(2500000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rownum
FROM master.dbo.spt_values spt1
CROSS APPLY master.dbo.spt_values spt2) as sptvalues
INSERT INTO dbo.stg_table WITH(TABLOCK)
(
service_date,val)
SELECT -- 4M
DATEADD(S,rownum,GETDATE()),rownum
FROM
(SELECT TOP(4000000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rownum
FROM master.dbo.spt_values spt1
CROSS APPLY master.dbo.spt_values spt2) as sptvaluesCPU time = 4906 ms, elapsed time = 4919 ms.CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[st_table_1] ([service_date])
INCLUDE ([stg_table_1_ID])CPU time = 11156 ms, elapsed time = 11332 ms.Context
StackExchange Database Administrators Q#243432, answer score: 6
Revisions (0)
No revisions yet.