patternsqlModerate
SQL Server - DELETE from subquery/derived table
Viewed 0 times
deletesqlderivedsubqueryserverfromtable
Problem
Is there a way I can turn the below
I would like to delete the corresponding returned records from the
Since I used derived tables I can't reference the
To summarize, the 2 physical tables used in the below query have identical structures. Only difference is
I would like to remove the duplicate records from
The below query I have runs in about 15 seconds and successfully identifies only the duplicate records (not including the original unique record we want to keep) but I have not been able to figure out how to delete the records that are returned from SC.
Is it possible or should I just take a different approach?
SELECT statement into a DELETE? I would like to delete the corresponding returned records from the
[ETL].[Stage_Claims] table.Since I used derived tables I can't reference the
Stage_Claims table.To summarize, the 2 physical tables used in the below query have identical structures. Only difference is
DUPS_Claims is a subset of Stage_Claims.DUPS_Claims contains duplicate records found in Stage_Claims. If a record exists 3 times in Stage_Claims, we will have that record 3 times in DUPS_Claims as well.Stage_Claims contains all records including the duplicate records in DUPS_Claims.I would like to remove the duplicate records from
Stage_Claims leaving only 1 unique record for every duplicated record.Stage_Claims has just short of 1 million rows so I do not want to use Row_Number / Partition on the entire table as it takes over 2 minutes to run.The below query I have runs in about 15 seconds and successfully identifies only the duplicate records (not including the original unique record we want to keep) but I have not been able to figure out how to delete the records that are returned from SC.
Is it possible or should I just take a different approach?
SELECT *
FROM (
SELECT RN = ROW_NUMBER() OVER (
PARTITION BY SC.ID ORDER BY SC.id
)
,SC.*
FROM [ETL].[Stage_Claims] SC
WHERE ID IN (
SELECT ID
FROM (
SELECT RN = ROW_NUMBER() OVER (
PARTITION BY ID ORDER BY id
)
,ID
FROM [ETL].[DUPS_Claims]
) AS t1
WHERE RN > 1
)
) AS t2
WHERE RN > 1Solution
Convert your select statement into a CTE, and
Standard warning: You should test this in a non-production environment.
You can simplify your query quite a bit, and likely get better performance by using the below query, which does not make use of the intermediate table,
I created a non-clustered, non-unique index on both tables, then looked at the execution plans for both variations.
The first variant:
The second variant:
The first variant scans the index twice, whereas the second variant clearly only needs to scan the index a single time, and doesn't require a relatively expensive merge join in my somewhat contrived example. My sample
In case CTEs aren't your thing, you can use this approach to delete from a derived table, instead:
The query plan for the above
DELETE FROM the CTE, as in:;WITH del AS
(
SELECT *
FROM (
SELECT RN = ROW_NUMBER() OVER (
PARTITION BY SC.ID ORDER BY SC.id
)
,SC.*
FROM [ETL].[Stage_Claims] SC
WHERE ID IN (
SELECT ID
FROM (
SELECT RN = ROW_NUMBER() OVER (
PARTITION BY ID ORDER BY id
)
,ID
FROM [ETL].[DUPS_Claims]
) AS t1
WHERE RN > 1
)
) AS t2
WHERE RN > 1
)
DELETE FROM del;Standard warning: You should test this in a non-production environment.
You can simplify your query quite a bit, and likely get better performance by using the below query, which does not make use of the intermediate table,
DUPS_Claims, since it is absolutely unnecessary:;WITH cte AS
(
SELECT sc.ID
, rn = ROW_NUMBER() OVER (PARTITION BY sc.ID ORDER BY sc.ID)
FROM ETL.Stage_Claims sc
)
DELETE
FROM cte
WHERE rn > 1;I created a non-clustered, non-unique index on both tables, then looked at the execution plans for both variations.
The first variant:
The second variant:
The first variant scans the index twice, whereas the second variant clearly only needs to scan the index a single time, and doesn't require a relatively expensive merge join in my somewhat contrived example. My sample
ETL.Stage_Claims table contains 89 unique ID values, each duplicated 89 times, for a total of 7921 rows. In case CTEs aren't your thing, you can use this approach to delete from a derived table, instead:
DELETE c
FROM (
SELECT sc.ID
, rn = ROW_NUMBER() OVER (PARTITION BY sc.ID ORDER BY sc.ID)
FROM ETL.Stage_Claims sc
) c
WHERE rn > 1;The query plan for the above
DELETE from the derived table:Code Snippets
;WITH del AS
(
SELECT *
FROM (
SELECT RN = ROW_NUMBER() OVER (
PARTITION BY SC.ID ORDER BY SC.id
)
,SC.*
FROM [ETL].[Stage_Claims] SC
WHERE ID IN (
SELECT ID
FROM (
SELECT RN = ROW_NUMBER() OVER (
PARTITION BY ID ORDER BY id
)
,ID
FROM [ETL].[DUPS_Claims]
) AS t1
WHERE RN > 1
)
) AS t2
WHERE RN > 1
)
DELETE FROM del;;WITH cte AS
(
SELECT sc.ID
, rn = ROW_NUMBER() OVER (PARTITION BY sc.ID ORDER BY sc.ID)
FROM ETL.Stage_Claims sc
)
DELETE
FROM cte
WHERE rn > 1;DELETE c
FROM (
SELECT sc.ID
, rn = ROW_NUMBER() OVER (PARTITION BY sc.ID ORDER BY sc.ID)
FROM ETL.Stage_Claims sc
) c
WHERE rn > 1;Context
StackExchange Database Administrators Q#120233, answer score: 13
Revisions (0)
No revisions yet.