patternsqlMinor
Find & Delete Duplicate Records on All Tables in current MSSQL Database
Viewed 0 times
tablesdeleteallduplicaterecordsmssqldatabasecurrentfind
Problem
The purpose of this code is to search the entire database for duplicate records and produce a script that a user could then run to delete all of the duplicates.
The stored procedure takes the following parameters:
The output string defines the use database, a cte of duplicates and delete statement for each table in the database.
I've just started to get my hands dirty with SQL and could use some enlightenment on how to improve this script in any way.
```
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[sp_getDatabaseWideDuplicateDeleteScript] (@DiscludeTables varchar(max), @DiscludeColumns varchar(max), @Output varchar(max) output )
AS
--START SETUP DISCLUDE TABLES AND DISCLUDE COLUMNS
--*
--this section basically just splits the strings @discludetables and @discludecolumns by ';' occurance
-- and sticks the resulting strings in to tables @@discludecolumns and @@discludeTables
--Debug Variables
--DECLARE @DiscludeTables varchar(max)
--SET @DiscludeTables = '__MigrationHistory'
--DECLARE @DiscludeColumns varchar(max)
--SET @DiscludeColumns = 'Name'
--DECLARE @Output varchar(max)
DECLARE @pos INT
DECLARE @string varchar(max)
DECLARE @@DiscludeTables TABLE
(
tableName varchar(max)
)
DECLARE @@DiscludeColumns TABLE
(
columnName varchar(max)
)
DECLARE @stringT
The stored procedure takes the following parameters:
@DiscludeTables varchar(max): a string of Table Names separated by semicolons. These tables are not included in the search for duplicate records and therefore will not be affected in any way by the resulting script when it is run.@DiscludeColumns varchar(max): a string of Column Names separated by semicolons. These columns are not used to determine whether records are duplicates or not. Including any column here will effectively identify that records can have varying information within this column and still be identified as a duplicate.@Output varchar(max): this is the string that the output will be passed to.The output string defines the use database, a cte of duplicates and delete statement for each table in the database.
I've just started to get my hands dirty with SQL and could use some enlightenment on how to improve this script in any way.
```
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[sp_getDatabaseWideDuplicateDeleteScript] (@DiscludeTables varchar(max), @DiscludeColumns varchar(max), @Output varchar(max) output )
AS
--START SETUP DISCLUDE TABLES AND DISCLUDE COLUMNS
--*
--this section basically just splits the strings @discludetables and @discludecolumns by ';' occurance
-- and sticks the resulting strings in to tables @@discludecolumns and @@discludeTables
--Debug Variables
--DECLARE @DiscludeTables varchar(max)
--SET @DiscludeTables = '__MigrationHistory'
--DECLARE @DiscludeColumns varchar(max)
--SET @DiscludeColumns = 'Name'
--DECLARE @Output varchar(max)
DECLARE @pos INT
DECLARE @string varchar(max)
DECLARE @@DiscludeTables TABLE
(
tableName varchar(max)
)
DECLARE @@DiscludeColumns TABLE
(
columnName varchar(max)
)
DECLARE @stringT
Solution
You started out with all your keywords being capitalized and then you abandoned your capitalization haphazardly throughout the script, this makes it a little hard to read because your brain says that keywords are capitalized and then all of a sudden you see a
"hold up a second, no that's a keyword, never mind"
make sure that all your keywords are capitalized.
Here is a good example
This also brings up something else that can really go either way, but I will explain my reasoning.
Your if statements should be indented, and this is how I would have written it,
I know that is a lot of indentation for an if statement nested inside of an if statement, but I always want to make certain I know exactly what my code is doing, especially when writing SQL Stored Procedures that are going to be creating
I may also consider doing it this way to save on indentation
I also indent the
I would take this,
and indent everything after the
And the last and most important thing that I could advise is that you put intentional commenting on the
begin and your brain says, "hold up a second, no that's a keyword, never mind"
make sure that all your keywords are capitalized.
Here is a good example
if @columnname != '['+@pkColumnName+']'
BEGIN
if @columns != ' '
begin
set @columns = @columns + ', '
END
set @columns = @columns + 't.'+@ColumnName
ENDThis also brings up something else that can really go either way, but I will explain my reasoning.
Your if statements should be indented, and this is how I would have written it,
IF @columnname != '['+@pkColumnName+']'
BEGIN
IF @columns != ' '
BEGIN
SET @columns = @columns + ', '
END
SET @columns = @columns + 't.'+@ColumnName
ENDI know that is a lot of indentation for an if statement nested inside of an if statement, but I always want to make certain I know exactly what my code is doing, especially when writing SQL Stored Procedures that are going to be creating
DELETE scripts. I may also consider doing it this way to save on indentation
IF @columnname != '['+@pkColumnName+']'
BEGIN
IF @columns != ' '
BEGIN
SET @columns = @columns + ', '
END
SET @columns = @columns + 't.'+@ColumnName
ENDI also indent the
SELECT statements a little differently as well, and if I recall correctly, I think that many other SQL writers do it the same way that I do.I would take this,
SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
AND TABLE_NAME = PARSENAME(@TableName, 1)
AND QUOTENAME(COLUMN_NAME) > @ColumnName
AND COLUMN_NAME NOT in (SELECT columnname FROM @@DiscludeColumns)and indent everything after the
SELECT like this,SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
AND TABLE_NAME = PARSENAME(@TableName, 1)
AND QUOTENAME(COLUMN_NAME) > @ColumnName
AND COLUMN_NAME NOT IN (SELECT columnname
FROM @@DiscludeColumns)And the last and most important thing that I could advise is that you put intentional commenting on the
DELETE portion of the script and uncomment the SELECT portion of the rendered script, so that you always run a SELECT to see what you are planning on deleting, before you delete it.Code Snippets
if @columnname != '['+@pkColumnName+']'
BEGIN
if @columns != ' '
begin
set @columns = @columns + ', '
END
set @columns = @columns + 't.'+@ColumnName
ENDIF @columnname != '['+@pkColumnName+']'
BEGIN
IF @columns != ' '
BEGIN
SET @columns = @columns + ', '
END
SET @columns = @columns + 't.'+@ColumnName
ENDIF @columnname != '['+@pkColumnName+']'
BEGIN
IF @columns != ' '
BEGIN
SET @columns = @columns + ', '
END
SET @columns = @columns + 't.'+@ColumnName
ENDSELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
AND TABLE_NAME = PARSENAME(@TableName, 1)
AND QUOTENAME(COLUMN_NAME) > @ColumnName
AND COLUMN_NAME NOT in (SELECT columnname FROM @@DiscludeColumns)SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
AND TABLE_NAME = PARSENAME(@TableName, 1)
AND QUOTENAME(COLUMN_NAME) > @ColumnName
AND COLUMN_NAME NOT IN (SELECT columnname
FROM @@DiscludeColumns)Context
StackExchange Code Review Q#102517, answer score: 8
Revisions (0)
No revisions yet.