patternsqlModerate
Efficiently remove 90% of a table's data
Viewed 0 times
removeefficientlydatatable
Problem
I am working on a process to remove 90% of a table's data because only 10% is needed for testing.
The best way I have found includes storing 10% of the table's rows into a temporary table.
Current method
This method is filling up tempdb and causing the disk to fill up as well.
Questions
Is there a more efficient way to delete 90% of a table's data
ex (
or
Is there a way to insert 10% of some_table's data into a temporary table using batches? Something like this:
Possible solution
How about this?
The best way I have found includes storing 10% of the table's rows into a temporary table.
Current method
SELECT TOP 10 PERCENT *
INTO #temp_some_table
FROM some_table (nolock)
ORDER BY some_column DESC
TRUNCATE TABLE some_table
INSERT INTO some_table
SELECT *
FROM #temp_some_table
DROP TABLE #temp_some_tableThis method is filling up tempdb and causing the disk to fill up as well.
Questions
Is there a more efficient way to delete 90% of a table's data
ex (
DELETE TOP 90 PERCENT FROM sometable)or
Is there a way to insert 10% of some_table's data into a temporary table using batches? Something like this:
DECLARE @r INT;
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
INSERT INTO [dbo].[##temp_cds_Basket]
SELECT TOP 10 PERCENT *
FROM [dbo].[cds_basket] s
SET @r = @@ROWCOUNT;
print @r
COMMIT TRANSACTION
ENDPossible solution
How about this?
SET NOCOUNT ON;
DECLARE @r INT;
DECLARE @TenPercentDate datetime
with cte (some_column) as (
select top 10 percent some_column from some_table (nolock) order by some_column desc
)
select @TenPercentDate = min(some_column)
from cte
select @TenPercentDate
SET @r = 1;
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
DELETE TOP (10000) from
some_table
WHERE some_column < @TenPercentDate
SET @r = @@ROWCOUNT;
print @r
COMMIT TRANSACTION;
--CHECKPOINT; -- if simple
END
--rollbackSolution
I have three different ideas that I mentioned in a comment above. Here is a little elaboration on at least one of them (which you are stuck on due to self-diagnosed tunnel vision).
I was thinking about this but we don't really need to do the compare - we can just figure out what datetime value is at the edge of what we want to keep, and delete chunks of older rows until there are none left. For example:
You can see this post for some additional ways to enhance this.
I think this would be a valid option and should cause less pain than the above.
This one is also less painful, but feel free to inject some debugging in there to be sure that you got the data you wanted before you perform the drop.
Note that in both of these latter cases there may be constraints, schema-bound views, etc. that prevent you from just dropping the table (obviously an inbound foreign key is not an issue because you are able to truncate, but outbound may still need to be dealt with). There will also be ramifications to existing plans (well deleting 90% of the data should also cause stats to update and invalidate plans, so really no different), and as @Kenneth points out below, you will need to re-establish permissions on the new table (so you might want to be sure you can script those in advance).
- Well, you could calculate the number of rows that make up 10% beforehand, and then compare that in your batch.
I was thinking about this but we don't really need to do the compare - we can just figure out what datetime value is at the edge of what we want to keep, and delete chunks of older rows until there are none left. For example:
SET NOCOUNT ON;
DECLARE
@rc INT = 1,
@cutoff DATETIME,
@batchsize INT = 10000;
;WITH x(dt) AS
(
SELECT TOP (10) PERCENT datetime_column
FROM dbo.mytable
ORDER BY datetime_column DESC
)
SELECT TOP (1) @cutoff = dt -- earliest row we want to keep
FROM x
ORDER BY dt;
WHILE @rc > 0
BEGIN
DELETE TOP (@batchsize) dbo.mytable
WHERE datetime_column < @cutoff
SET @rc = @@ROWCOUNT;
ENDYou can see this post for some additional ways to enhance this.
- You could consider drop / select into instead of truncate / insert, and also think about using bulk-logged recovery during this operation.
I think this would be a valid option and should cause less pain than the above.
- You could also just perform one data movement operation:
SELECT TOP 10 PERCENT cols
INTO dbo.newtable
FROM dbo.oldtable
ORDER BY datetime_column;
DROP TABLE dbo.oldtable;
EXEC sys.sp_rename N'dbo.newtable', N'dbo.oldtable', N'OBJECT';This one is also less painful, but feel free to inject some debugging in there to be sure that you got the data you wanted before you perform the drop.
Note that in both of these latter cases there may be constraints, schema-bound views, etc. that prevent you from just dropping the table (obviously an inbound foreign key is not an issue because you are able to truncate, but outbound may still need to be dealt with). There will also be ramifications to existing plans (well deleting 90% of the data should also cause stats to update and invalidate plans, so really no different), and as @Kenneth points out below, you will need to re-establish permissions on the new table (so you might want to be sure you can script those in advance).
Code Snippets
SET NOCOUNT ON;
DECLARE
@rc INT = 1,
@cutoff DATETIME,
@batchsize INT = 10000;
;WITH x(dt) AS
(
SELECT TOP (10) PERCENT datetime_column
FROM dbo.mytable
ORDER BY datetime_column DESC
)
SELECT TOP (1) @cutoff = dt -- earliest row we want to keep
FROM x
ORDER BY dt;
WHILE @rc > 0
BEGIN
DELETE TOP (@batchsize) dbo.mytable
WHERE datetime_column < @cutoff
SET @rc = @@ROWCOUNT;
ENDSELECT TOP 10 PERCENT cols
INTO dbo.newtable
FROM dbo.oldtable
ORDER BY datetime_column;
DROP TABLE dbo.oldtable;
EXEC sys.sp_rename N'dbo.newtable', N'dbo.oldtable', N'OBJECT';Context
StackExchange Database Administrators Q#95085, answer score: 10
Revisions (0)
No revisions yet.