patternsqlMinor
Optimizing removal of duplicate rows from recursive CTE query
Viewed 0 times
removalrowsduplicatequerycterecursiveoptimizingfrom
Problem
I need to write a recursive SQL query to fetch multiple projects together with their lineage from the following table:
(this is just a toy example, so please ignore table design issues and the lack of indexing)
So, given multiple project names, I need to fetch a table that contains the projects as well as all their ancestors, because the result table will have to support lineage queries for a given Project name
In order to do this, I came up with the following recursive CTE query:
And, here is the result of the query:
As expected, this returns projects C1 and C2, as well as their parents, which will be duplicated for each child, unfortunately.
In order to get rid of the duplicated parents, I added
Which returns:
Even though this produces the desired result, I am concerned about the performance of this query. The Lineage CTE will surely contain a lot of extra rows if I let it return these duplicates, which I remove at a later step in the query.
Is there any way to rewrite the CTE such that it removes these duplicates directly?
UPDATE: Just in case my example wasn't clear enough, I need to somehow specify the project names for which to fetch the lineag
CREATE TABLE #Project
(
Name VARCHAR(10),
ParentName VARCHAR(10)
)
INSERT INTO #Project(Name, ParentName) VALUES ('P', NULL);
INSERT INTO #Project(Name, ParentName) VALUES ('C1', 'P');
INSERT INTO #Project(Name, ParentName) VALUES ('C2', 'P');
SELECT Name, ParentName FROM #Project(this is just a toy example, so please ignore table design issues and the lack of indexing)
So, given multiple project names, I need to fetch a table that contains the projects as well as all their ancestors, because the result table will have to support lineage queries for a given Project name
In order to do this, I came up with the following recursive CTE query:
WITH Lineage (Name, ParentName)
AS
(
SELECT Name, ParentName
FROM #Project WHERE Name = 'C1' OR Name = 'C2'
UNION ALL
SELECT #Project.Name, #Project.ParentName
FROM #Project INNER JOIN Lineage ON #Project.Name = Lineage.ParentName
)
SELECT Name, ParentName FROM LineageAnd, here is the result of the query:
Name | ParentName
C1 | P
C2 | P
P | NULL
P | NULLAs expected, this returns projects C1 and C2, as well as their parents, which will be duplicated for each child, unfortunately.
In order to get rid of the duplicated parents, I added
DISTINCT to the final SELECT:SELECT DISTINCT Name, ParentName FROM LineageWhich returns:
Name | ParentName
P | NULL
C1 | P
C2 | PEven though this produces the desired result, I am concerned about the performance of this query. The Lineage CTE will surely contain a lot of extra rows if I let it return these duplicates, which I remove at a later step in the query.
Is there any way to rewrite the CTE such that it removes these duplicates directly?
UPDATE: Just in case my example wasn't clear enough, I need to somehow specify the project names for which to fetch the lineag
Solution
How about this? Kind of a hack - but row_number should be quicker than distinct.
;WITH Lineage AS
(
SELECT
Name
,CASE WHEN ROW_NUMBER() OVER (PARTITION BY ParentName ORDER BY ParentName) > 1
THEN NULL
ELSE ParentName END AS ParentNameToJoin
,ParentName AS ParentNameToDisplay
FROM
#Project
WHERE
Name in ('C1','C2')
UNION ALL
SELECT
p.Name
,CASE WHEN ROW_NUMBER() OVER (PARTITION BY p.ParentName ORDER BY p.ParentName) > 1
THEN NULL
ELSE ParentName END AS ParentNameToJoin
,p.ParentName
FROM
#Project as p
INNER JOIN
Lineage ON p.Name = Lineage.ParentNameToJoin
)
SELECT
Name
,ParentNameToDisplay
FROM
lineageCode Snippets
;WITH Lineage AS
(
SELECT
Name
,CASE WHEN ROW_NUMBER() OVER (PARTITION BY ParentName ORDER BY ParentName) > 1
THEN NULL
ELSE ParentName END AS ParentNameToJoin
,ParentName AS ParentNameToDisplay
FROM
#Project
WHERE
Name in ('C1','C2')
UNION ALL
SELECT
p.Name
,CASE WHEN ROW_NUMBER() OVER (PARTITION BY p.ParentName ORDER BY p.ParentName) > 1
THEN NULL
ELSE ParentName END AS ParentNameToJoin
,p.ParentName
FROM
#Project as p
INNER JOIN
Lineage ON p.Name = Lineage.ParentNameToJoin
)
SELECT
Name
,ParentNameToDisplay
FROM
lineageContext
StackExchange Database Administrators Q#51900, answer score: 4
Revisions (0)
No revisions yet.