patternsqlMajor
What is the quickest way to purge data?
Viewed 0 times
quickestthewhatwaypurgedata
Problem
Scenario:
We have two tables
Now, we have to purge (approx. 900 million records) from
What I am looking for;
The fastest way to purge that data without any production issue, data consistency, and possibly no downtime. So, I am thinking to follow the below steps but stuck :(
Steps:
-
Once BCP-in get completed, use tsql script to insert the new delta data.
-
The Challenge is - How to deal with delta “update” statement?
-
Start the Replication
Additional Question:
What is the best way to deal with scenario?
We have two tables
Tbl1 & Tbl2 on the Subscriber Server. The Tbl1 is being replicated from Publisher Server A and it has two triggers - insert and update. The triggers are inserting and updating the data into Tbl2. Now, we have to purge (approx. 900 million records) from
Tbl2 which has total 1000+ million record. Below is the data distribution for one month to one minute.- One Month - 14986826 rows
- One day - 483446 rows
- One hour - 20143 rows
- One minute - 335 rows
What I am looking for;
The fastest way to purge that data without any production issue, data consistency, and possibly no downtime. So, I am thinking to follow the below steps but stuck :(
Steps:
- BCP Out the required data from the existing table Tbl2 (around 100 million records, it may take approx. 30 mins).
- Let’s assume I started doing the activity on 1Fab2018 10:00PM, it finished at 1Fab2018 10:30PM. By the time activity will be completed, the table Tbl2 will get new records that becomes delta
- Create a new table in the database with name Tbl3
- BCP in the exported data into the newly created table Tbl3 (around 100 million records, it may take approx. 30 mins)
- Stop the replication job
-
Once BCP-in get completed, use tsql script to insert the new delta data.
-
The Challenge is - How to deal with delta “update” statement?
-
Start the Replication
Additional Question:
What is the best way to deal with scenario?
Solution
Since you are deleting 90% of the rows, I'd recommend copying the rows you need to keep into a new table with the same structure, then use
A simple test-bed, without replication which shows the general principle:
First, we'll create a database for our test:
Here, we create a couple of tables, with a trigger to move rows from table "A" to "B", approximating your setup.
Here, we insert 1,000,000 rows into "A", and because of the trigger, those rows will also be inserted into "B".
Clear the transaction log, to avoid running out of room. DO NOT RUN this in production since it sends the transaction log data to the "NUL" device.
This code creates a transaction to ensure none of the affected tables can be written to while we're migrating rows:
The
(Note that app locks are only effective if every process accessing the resource implements the same manual resource locking logic explicitly - there is no magic that "locks" the table in the same way that SQL Server automatically locks rows, pages, etc. during an insert/update operation.)
Now, we test the process of inserting rows into "A", to ensure they're inserted into "B" by the trigger.
+---------+---------+-------------------------+
| i | d | rowdate |
+---------+---------+-----------------
ALTER TABLE ... SWITCH to replace the existing table with the new table, then simply drop the old table. See this Microsoft Docs page for the syntax.A simple test-bed, without replication which shows the general principle:
First, we'll create a database for our test:
USE master;
IF (SELECT 1 FROM sys.databases d WHERE d.name = 'SwitchTest') IS NOT NULL
BEGIN
ALTER DATABASE SwitchTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE SwitchTest;
END
CREATE DATABASE SwitchTest;
ALTER DATABASE SwitchTest SET RECOVERY FULL;
BACKUP DATABASE SwitchTest TO DISK = 'NUL:';
GOHere, we create a couple of tables, with a trigger to move rows from table "A" to "B", approximating your setup.
USE SwitchTest;
GO
CREATE TABLE dbo.A
(
i int NOT NULL
CONSTRAINT PK_A
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, d varchar(300) NOT NULL
, rowdate datetime NOT NULL
) ON [PRIMARY]
WITH (DATA_COMPRESSION = PAGE);
CREATE TABLE dbo.B
(
i int NOT NULL
CONSTRAINT PK_B
PRIMARY KEY CLUSTERED
, d varchar(300) NOT NULL
, rowdate datetime NOT NULL
) ON [PRIMARY]
WITH (DATA_COMPRESSION = PAGE);
GO
CREATE TRIGGER t_a
ON dbo.A
AFTER INSERT, UPDATE
AS
BEGIN
SET NOCOUNT ON;
DELETE
FROM dbo.B
FROM dbo.B b
INNER JOIN deleted d ON b.i = d.i
INSERT INTO dbo.B (i, d, rowdate)
SELECT i.i
, i.d
, i.rowdate
FROM inserted i;
END
GOHere, we insert 1,000,000 rows into "A", and because of the trigger, those rows will also be inserted into "B".
;WITH src AS (
SELECT i.n
FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))i(n)
)
INSERT INTO dbo.A (d, rowdate)
SELECT d = CRYPT_GEN_RANDOM(300), DATEADD(SECOND, s6.n + (s5.n * 100000) + (s4.n * 10000) + (s3.n * 1000) + (s2.n * 100) + (s1.n * 10), '2017-01-01T00:00:00.000')
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
CROSS JOIN src s4
CROSS JOIN src s5
CROSS JOIN src s6;Clear the transaction log, to avoid running out of room. DO NOT RUN this in production since it sends the transaction log data to the "NUL" device.
BACKUP LOG SwitchTest TO DISK = 'NUL:';
GOThis code creates a transaction to ensure none of the affected tables can be written to while we're migrating rows:
BEGIN TRANSACTION
EXEC sys.sp_getapplock @Resource = N'TableSwitcher', @LockMode = 'Exclusive', @LockOwner = 'Transaction', @LockTimeout = '1000', @DbPrincipal = N'dbo';
BEGIN TRY
-- create a table to hold the rows we want to keep
CREATE TABLE dbo.C
(
i int NOT NULL
CONSTRAINT PK_C
PRIMARY KEY CLUSTERED
, d varchar(300) NOT NULL
, rowdate datetime NOT NULL
) ON [PRIMARY]
WITH (DATA_COMPRESSION = PAGE);
--copy the rows we want to keep into "C"
INSERT INTO dbo.C (i, d, rowdate)
SELECT b.i
, b.d
, b.rowdate
FROM dbo.B
WHERE b.rowdate >= '2017-01-11T10:00:00';
--truncate the entire "B" table
TRUNCATE TABLE dbo.B;
--"switch" table "C" into "B"
ALTER TABLE dbo.C SWITCH TO dbo.B;
--drop table "C", since we no longer need it
DROP TABLE dbo.C;
--shows the count of rows in "B" which were retained.
SELECT COUNT(1)
FROM dbo.B
WHERE b.rowdate >= '2017-01-11T10:00:00';
--look for rows in "B" that should no longer exist.
SELECT COUNT(1)
FROM dbo.B
WHERE b.rowdate < '2017-01-11T10:00:00';
--release the applock and commit the transaction
EXEC sys.sp_releaseapplock @Resource = N'TableSwitcher', @LockOwner = 'Transaction', @DbPrincipal = N'dbo';
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
DECLARE @message nvarchar(1000) = ERROR_MESSAGE();
DECLARE @severity int = ERROR_SEVERITY();
DECLARE @state int = ERROR_STATE();
RAISERROR (@message, @severity, @state);
EXEC sys.sp_releaseapplock @Resource = N'TableSwitcher', @LockOwner = 'Transaction', @DbPrincipal = N'dbo';
ROLLBACK TRANSACTION;
END CATCH
GOThe
sp_getapplock and sp_releaseapplock prevent multiple instances of this code running at the same time. This would be helpful if you enable this code to be re-used through a GUI.(Note that app locks are only effective if every process accessing the resource implements the same manual resource locking logic explicitly - there is no magic that "locks" the table in the same way that SQL Server automatically locks rows, pages, etc. during an insert/update operation.)
Now, we test the process of inserting rows into "A", to ensure they're inserted into "B" by the trigger.
INSERT INTO dbo.A (d, rowdate)
VALUES ('testRow', GETDATE());
SELECT *
FROM dbo.B
WHERE B.d = 'testRow'+---------+---------+-------------------------+
| i | d | rowdate |
+---------+---------+-----------------
Code Snippets
USE master;
IF (SELECT 1 FROM sys.databases d WHERE d.name = 'SwitchTest') IS NOT NULL
BEGIN
ALTER DATABASE SwitchTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE SwitchTest;
END
CREATE DATABASE SwitchTest;
ALTER DATABASE SwitchTest SET RECOVERY FULL;
BACKUP DATABASE SwitchTest TO DISK = 'NUL:';
GOUSE SwitchTest;
GO
CREATE TABLE dbo.A
(
i int NOT NULL
CONSTRAINT PK_A
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, d varchar(300) NOT NULL
, rowdate datetime NOT NULL
) ON [PRIMARY]
WITH (DATA_COMPRESSION = PAGE);
CREATE TABLE dbo.B
(
i int NOT NULL
CONSTRAINT PK_B
PRIMARY KEY CLUSTERED
, d varchar(300) NOT NULL
, rowdate datetime NOT NULL
) ON [PRIMARY]
WITH (DATA_COMPRESSION = PAGE);
GO
CREATE TRIGGER t_a
ON dbo.A
AFTER INSERT, UPDATE
AS
BEGIN
SET NOCOUNT ON;
DELETE
FROM dbo.B
FROM dbo.B b
INNER JOIN deleted d ON b.i = d.i
INSERT INTO dbo.B (i, d, rowdate)
SELECT i.i
, i.d
, i.rowdate
FROM inserted i;
END
GO;WITH src AS (
SELECT i.n
FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))i(n)
)
INSERT INTO dbo.A (d, rowdate)
SELECT d = CRYPT_GEN_RANDOM(300), DATEADD(SECOND, s6.n + (s5.n * 100000) + (s4.n * 10000) + (s3.n * 1000) + (s2.n * 100) + (s1.n * 10), '2017-01-01T00:00:00.000')
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
CROSS JOIN src s4
CROSS JOIN src s5
CROSS JOIN src s6;BACKUP LOG SwitchTest TO DISK = 'NUL:';
GOBEGIN TRANSACTION
EXEC sys.sp_getapplock @Resource = N'TableSwitcher', @LockMode = 'Exclusive', @LockOwner = 'Transaction', @LockTimeout = '1000', @DbPrincipal = N'dbo';
BEGIN TRY
-- create a table to hold the rows we want to keep
CREATE TABLE dbo.C
(
i int NOT NULL
CONSTRAINT PK_C
PRIMARY KEY CLUSTERED
, d varchar(300) NOT NULL
, rowdate datetime NOT NULL
) ON [PRIMARY]
WITH (DATA_COMPRESSION = PAGE);
--copy the rows we want to keep into "C"
INSERT INTO dbo.C (i, d, rowdate)
SELECT b.i
, b.d
, b.rowdate
FROM dbo.B
WHERE b.rowdate >= '2017-01-11T10:00:00';
--truncate the entire "B" table
TRUNCATE TABLE dbo.B;
--"switch" table "C" into "B"
ALTER TABLE dbo.C SWITCH TO dbo.B;
--drop table "C", since we no longer need it
DROP TABLE dbo.C;
--shows the count of rows in "B" which were retained.
SELECT COUNT(1)
FROM dbo.B
WHERE b.rowdate >= '2017-01-11T10:00:00';
--look for rows in "B" that should no longer exist.
SELECT COUNT(1)
FROM dbo.B
WHERE b.rowdate < '2017-01-11T10:00:00';
--release the applock and commit the transaction
EXEC sys.sp_releaseapplock @Resource = N'TableSwitcher', @LockOwner = 'Transaction', @DbPrincipal = N'dbo';
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
DECLARE @message nvarchar(1000) = ERROR_MESSAGE();
DECLARE @severity int = ERROR_SEVERITY();
DECLARE @state int = ERROR_STATE();
RAISERROR (@message, @severity, @state);
EXEC sys.sp_releaseapplock @Resource = N'TableSwitcher', @LockOwner = 'Transaction', @DbPrincipal = N'dbo';
ROLLBACK TRANSACTION;
END CATCH
GOContext
StackExchange Database Administrators Q#203957, answer score: 26
Revisions (0)
No revisions yet.