HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Find & Delete Duplicate Records on All Tables in current MSSQL Database

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
tablesdeleteallduplicaterecordsmssqldatabasecurrentfind

Problem

The purpose of this code is to search the entire database for duplicate records and produce a script that a user could then run to delete all of the duplicates.

The stored procedure takes the following parameters:

@DiscludeTables varchar(max): a string of Table Names separated by semicolons. These tables are not included in the search for duplicate records and therefore will not be affected in any way by the resulting script when it is run.

@DiscludeColumns varchar(max): a string of Column Names separated by semicolons. These columns are not used to determine whether records are duplicates or not. Including any column here will effectively identify that records can have varying information within this column and still be identified as a duplicate.

@Output varchar(max): this is the string that the output will be passed to.

The output string defines the use database, a cte of duplicates and delete statement for each table in the database.

I've just started to get my hands dirty with SQL and could use some enlightenment on how to improve this script in any way.

```
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[sp_getDatabaseWideDuplicateDeleteScript] (@DiscludeTables varchar(max), @DiscludeColumns varchar(max), @Output varchar(max) output )
AS

--START SETUP DISCLUDE TABLES AND DISCLUDE COLUMNS
--*
--this section basically just splits the strings @discludetables and @discludecolumns by ';' occurance
-- and sticks the resulting strings in to tables @@discludecolumns and @@discludeTables
--Debug Variables
--DECLARE @DiscludeTables varchar(max)
--SET @DiscludeTables = '__MigrationHistory'
--DECLARE @DiscludeColumns varchar(max)
--SET @DiscludeColumns = 'Name'
--DECLARE @Output varchar(max)

DECLARE @pos INT
DECLARE @string varchar(max)
DECLARE @@DiscludeTables TABLE
(
tableName varchar(max)
)

DECLARE @@DiscludeColumns TABLE
(
columnName varchar(max)
)

DECLARE @stringT

Solution

You started out with all your keywords being capitalized and then you abandoned your capitalization haphazardly throughout the script, this makes it a little hard to read because your brain says that keywords are capitalized and then all of a sudden you see a begin and your brain says,


"hold up a second, no that's a keyword, never mind"

make sure that all your keywords are capitalized.

Here is a good example

if @columnname != '['+@pkColumnName+']'
    BEGIN
        if @columns != ' '
        begin
            set @columns = @columns + ', '
        END 
        set @columns = @columns + 't.'+@ColumnName
    END


This also brings up something else that can really go either way, but I will explain my reasoning.

Your if statements should be indented, and this is how I would have written it,

IF @columnname != '['+@pkColumnName+']'
    BEGIN
        IF @columns != ' '
            BEGIN
                SET @columns = @columns + ', '
            END 
        SET @columns = @columns + 't.'+@ColumnName
    END


I know that is a lot of indentation for an if statement nested inside of an if statement, but I always want to make certain I know exactly what my code is doing, especially when writing SQL Stored Procedures that are going to be creating DELETE scripts.

I may also consider doing it this way to save on indentation

IF @columnname != '['+@pkColumnName+']'
BEGIN
    IF @columns != ' '
    BEGIN
        SET @columns = @columns + ', '
    END 
    SET @columns = @columns + 't.'+@ColumnName
END


I also indent the SELECT statements a little differently as well, and if I recall correctly, I think that many other SQL writers do it the same way that I do.

I would take this,

SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM    INFORMATION_SCHEMA.COLUMNS
WHERE       TABLE_SCHEMA    = PARSENAME(@TableName, 2)
    AND TABLE_NAME  = PARSENAME(@TableName, 1)
    AND QUOTENAME(COLUMN_NAME) > @ColumnName
    AND COLUMN_NAME NOT in (SELECT columnname FROM @@DiscludeColumns)


and indent everything after the SELECT like this,

SELECT MIN(QUOTENAME(COLUMN_NAME))
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
        AND TABLE_NAME = PARSENAME(@TableName, 1)
        AND QUOTENAME(COLUMN_NAME) > @ColumnName
        AND COLUMN_NAME NOT IN (SELECT columnname 
                                    FROM @@DiscludeColumns)


And the last and most important thing that I could advise is that you put intentional commenting on the DELETE portion of the script and uncomment the SELECT portion of the rendered script, so that you always run a SELECT to see what you are planning on deleting, before you delete it.

Code Snippets

if @columnname != '['+@pkColumnName+']'
    BEGIN
        if @columns != ' '
        begin
            set @columns = @columns + ', '
        END 
        set @columns = @columns + 't.'+@ColumnName
    END
IF @columnname != '['+@pkColumnName+']'
    BEGIN
        IF @columns != ' '
            BEGIN
                SET @columns = @columns + ', '
            END 
        SET @columns = @columns + 't.'+@ColumnName
    END
IF @columnname != '['+@pkColumnName+']'
BEGIN
    IF @columns != ' '
    BEGIN
        SET @columns = @columns + ', '
    END 
    SET @columns = @columns + 't.'+@ColumnName
END
SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM    INFORMATION_SCHEMA.COLUMNS
WHERE       TABLE_SCHEMA    = PARSENAME(@TableName, 2)
    AND TABLE_NAME  = PARSENAME(@TableName, 1)
    AND QUOTENAME(COLUMN_NAME) > @ColumnName
    AND COLUMN_NAME NOT in (SELECT columnname FROM @@DiscludeColumns)
SELECT MIN(QUOTENAME(COLUMN_NAME))
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
        AND TABLE_NAME = PARSENAME(@TableName, 1)
        AND QUOTENAME(COLUMN_NAME) > @ColumnName
        AND COLUMN_NAME NOT IN (SELECT columnname 
                                    FROM @@DiscludeColumns)

Context

StackExchange Code Review Q#102517, answer score: 8

Revisions (0)

No revisions yet.