HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SQL procedure for Archiving and Deleting

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
deletingsqlarchivingprocedureforand

Problem

I have a requirement for archiving 5 huge tables from PROD to ARCHIVE server without losing the integrity of the tables.

The query makes use of the Linked Server functionality and current idea is to host it on the PROD server. It is a distributed transaction which makes use of Microsoft's Distributed Transaction Coordinator service.

The key requirement is that the PROD server will be live all the time and the performance of the server should not be affected by this procedure being executed.

Tables are as follows:

  • header0 - Header



  • detail0 - Detail



  • email0 - Emails



  • overboard0 - Overboard



  • references0 - References



Only including the header and detail table in the code for better understanding. Other 3 tables refer the same header tables

1 header may have around 2-4 details detail, 8 messages records and may or may not have records in other tables.

The tables don't have a Primary and Foreign Key Relation in DB as such but have columns that are dependent on header table.

Following is the procedure code currently implemented which is working fine for 18k records and processing 180k+ records in 3.5 minutes of time. Please suggest if this can be improved because going further this table may have 2.5 - 3 million records in a single day.

```
ALTER PROCEDURE [dbo].[usp_Compliane_Archive_And_Delete]
@client nvarchar(8), -- CLIENT field value
@verbose bit, -- Set to 0/1 for BASIC/VERBOSE logging
@chunkSize int, -- Change as per requirement
@historyDays int -- History Days for deletion
AS
BEGIN

SET XACT_ABORT ON

-- DECLARE A TABLE TO HOLD THE KEY VALUES FOR EVERY CHUNK
DECLARE @NextIDs TABLE(
cClient nvarchar(8),
iRunNo int,
UNIQUE NONCLUSTERED (cClient, iRunNo)
);

-- Hold history days value
DECLARE @Xdaysago datetime
SELECT @Xdaysago = DATEADD(DAY, -(@historyDays), GETDATE())

DECLARE @chunkCount int
SET @chunkCount = 0

DECLARE @procRunDate datetime
SET

Solution

Nitpicks

You use separate statements for DECLARE and SET quite a bit. While there is nothing wrong with that, it's often just easier to read if you combine them:

DECLARE @Xdaysago datetime = DATEADD(DAY, -(@historyDays), GETDATE());
DECLARE @chunkCount int = 0;
DECLARE @procRunDate datetime = GETDATE();


You made this comment in your post:

This will run in a chunk of 5000 records which can be controlled from the input parameter.

If you know that your default value is 5000, consider setting it as default value in your procedue signature (hence making the parameter optional). You could also do this for @verbose if you expect to run it with one of the two settings more often than not. In both cases, the params would only then need to be provided if the caller decided to not use the provided default values.

ALTER PROCEDURE [dbo].[usp_Compliane_Archive_And_Delete] 
    @client nvarchar(8),    -- CLIENT field value
    @historyDays int,       -- History Days for deletion
    @chunkSize int = 5000,  -- Change as per requirement
    @verbose bit = 0        -- Set to 0/1 for BASIC/VERBOSE logging
AS ...


Note I changed the order a bit to make the optional parameters last.

It's a good habit to terminate your statements with semicolon ; even though Microsoft T-SQL is not strict at the moment. It is the ANSI standard and as far as I am aware Microsoft is the only RDBMS that does not enforce this yet.

According to MSDN (emphasis mine):

Transact-SQL statement terminator.Although the semicolon is not required for most statements in this version of SQL Server, it will be required in a future version.

References:

-
Transact-SQL Syntax Conventions - MSDN

-
Always Use Semicolon Statement Terminators - Dan Guzman's Blog

Lines like these should be shortened to make reading easier (and avoid making mistakes like mixing up column counts):

INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[header0]
([client] ,[ckhrunn] ,[ckhsrc] ,[ckhtype] ,[ckhstat] ,[ckhrclnt] ,[ckhrdte] ,[ckhrtme] ,[ckhrusr] ,[ckhcver] ,[ckhprun] ,[ckhcdt1] ,[ckhusr1] ,[ckhusr2] ,[ckhusr3] ,[ckhusr4] ,[ckhactv] ,[ckhuser] ,[ckhdate] ,[ckhtime] ,[ckhwsid] ,[ckhupid] ,[ckhpsl01] ,[ckhpsl02] ,[ckhpsl03] ,[ckhpsl04] ,[ckhpsl05] ,[ckhpslvnum] ,[ckhpslvdte])
SELECT header.client, header.ckhrunn, header.ckhsrc, header.ckhtype, header.ckhstat, header.ckhrclnt, header.ckhrdte, header.ckhrtme, header.ckhrusr, header.ckhcver, header.ckhprun, header.ckhcdt1, header.ckhusr1, header.ckhusr2, header.ckhusr3, header.ckhusr4, header.ckhactv, header.ckhuser, header.ckhdate, header.ckhtime, header.ckhwsid, header.ckhupid, header.ckhpsl01, header.ckhpsl02, header.ckhpsl03, header.ckhpsl04, header.ckhpsl05, header.ckhpslvnum, header.ckhpslvdte
FROM [PS_902mssqldev_prod].[dbo].[header0] AS header


Consider adding line breaks every N columns on both the INSERT INTO and SELECT statements to make it more legible and having to scroll horizontally to see the whole statement.

You can use += and -= operators in SET statements (like increments within loops, etc.) as you can in many other languages:

SET @TotalRowInserted = @TotalRowInserted + @rowCount;
...
SET @chunkCount = @chunkCount + 1;


Become simply:

SET @TotalRowInserted += @rowCount;
...
SET @chunkCount += 1;


Performance

For the most part, this is pretty straightforward and there may not be huge performance gains to be made, but I spotted a few things that potentially could help some.*

* Your mileage may vary.

Transaction isolation level

There is no declared isolation level in your procedure. How could it impact performance? Well, it depends. The database probably has a default isolation level for read operations, you should definitely check that.

You may or may not be worried about uncommitted reads (a.k.a. "ghost reads"), and this would primarily be for the INSERT INTO @NextIDs as far as I can tell, but it could alleviate potential locks on the production database.

Reference: SET TRANSACTION ISOLATION LEVEL - MSDN

Another small improvement might be to SET NOCOUNT ON for the whole procedure, rather than just at the end in your CATCH block.

"Physical" temporary table

This:

DECLARE @NextIDs TABLE(
    cClient nvarchar(8),
    iRunNo  int,
    UNIQUE NONCLUSTERED (cClient, iRunNo)
    );


... might be better off as a "physical" #NextIDs temp table on disk, instead of an in-memory table. This of course can vary depending on the memory and I/O capabilities of the server(s), as well as which version of SQL Server you are using and so on. I can just say that from my own experience at least, #TempTable are almost always faster than @TempTable when used for any non-trivial amount of records. This also lets you control indexing and things like that, while the temp table exists.

Some references on the topic:

  • Special Table Types - Microsoft TechNet



This one is really good:

-
Are Table Variables as Good as Temporary Tables in SQL 2014? -

Code Snippets

DECLARE @Xdaysago datetime = DATEADD(DAY, -(@historyDays), GETDATE());
DECLARE @chunkCount int = 0;
DECLARE @procRunDate datetime = GETDATE();
ALTER PROCEDURE [dbo].[usp_Compliane_Archive_And_Delete] 
    @client nvarchar(8),    -- CLIENT field value
    @historyDays int,       -- History Days for deletion
    @chunkSize int = 5000,  -- Change as per requirement
    @verbose bit = 0        -- Set to 0/1 for BASIC/VERBOSE logging
AS ...
INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[header0]
([client] ,[ckhrunn] ,[ckhsrc] ,[ckhtype] ,[ckhstat] ,[ckhrclnt] ,[ckhrdte] ,[ckhrtme] ,[ckhrusr] ,[ckhcver] ,[ckhprun] ,[ckhcdt1] ,[ckhusr1] ,[ckhusr2] ,[ckhusr3] ,[ckhusr4] ,[ckhactv] ,[ckhuser] ,[ckhdate] ,[ckhtime] ,[ckhwsid] ,[ckhupid] ,[ckhpsl01] ,[ckhpsl02] ,[ckhpsl03] ,[ckhpsl04] ,[ckhpsl05] ,[ckhpslvnum] ,[ckhpslvdte])
SELECT header.client, header.ckhrunn, header.ckhsrc, header.ckhtype, header.ckhstat, header.ckhrclnt, header.ckhrdte, header.ckhrtme, header.ckhrusr, header.ckhcver, header.ckhprun, header.ckhcdt1, header.ckhusr1, header.ckhusr2, header.ckhusr3, header.ckhusr4, header.ckhactv, header.ckhuser, header.ckhdate, header.ckhtime, header.ckhwsid, header.ckhupid, header.ckhpsl01, header.ckhpsl02, header.ckhpsl03, header.ckhpsl04, header.ckhpsl05, header.ckhpslvnum, header.ckhpslvdte
FROM [PS_902mssqldev_prod].[dbo].[header0] AS header
SET @TotalRowInserted = @TotalRowInserted + @rowCount;
...
SET @chunkCount = @chunkCount + 1;
SET @TotalRowInserted += @rowCount;
...
SET @chunkCount += 1;

Context

StackExchange Code Review Q#122464, answer score: 3

Revisions (0)

No revisions yet.