patternsqlMinor
Optimizing a CTE hierarchy
Viewed 0 times
hierarchycteoptimizing
Problem
Update below
I have a table of accounts with a typical acct/parent account architecture to represent a hierarchy of accounts (SQL Server 2012). I created a VIEW using a CTE to hash out the hierarchy, and on the whole it works beautifully, and as intended. I can query the hierarchy at any level, and see the branches easily.
There is one business logic field that needs to be returned as a function of the hierarchy. A field in each account record describes the size of the business (we'll call it CustomerCount). The logic I need to report on needs to roll up the CustomerCount from the whole branch. In other words, given an account, I need to sum up the customercount values for that account along with every child in every branch below the account along the hierarchy.
I successfully calculated the field using a hierarchy field built within the CTE, which looks like acct4.acct3.acct2.acct1. The problem I'm running into is simply making it run fast. Without this one calculated field, the query runs in ~3 seconds. When I add in the calculated field, it turns into a 4 minute query.
Here is the best version I've been able to come up with that returns the correct results. I'm looking for ideas on how I can restructure this AS A VIEW without such huge sacrifices to performance.
I understand the reason this one goes slow (requires calculating a predicate in the where clause), but I can't think of another way to structure it and still get same results.
Here's some sample code to build a table and do the CTE pretty much exactly as it works in my environment.
```
Use Tempdb
go
CREATE TABLE dbo.Account
(
Acctid varchar(1) NOT NULL
, Name varchar(30) NULL
, ParentId varchar(1) NULL
, CustomerCount int NULL
);
INSERT Account
SELECT 'A','Best Bet',NULL,21 UNION ALL
SELECT 'B','eStore','A',30 UNION ALL
SELECT 'C','Big Bens','B',75 UNION ALL
SELECT 'D','Mr. Jimbo','B',50 UNION ALL
SELECT 'E','Dr. John','C',100 UNION ALL
SELECT 'F','Brick','A',222 UNION ALL
SE
I have a table of accounts with a typical acct/parent account architecture to represent a hierarchy of accounts (SQL Server 2012). I created a VIEW using a CTE to hash out the hierarchy, and on the whole it works beautifully, and as intended. I can query the hierarchy at any level, and see the branches easily.
There is one business logic field that needs to be returned as a function of the hierarchy. A field in each account record describes the size of the business (we'll call it CustomerCount). The logic I need to report on needs to roll up the CustomerCount from the whole branch. In other words, given an account, I need to sum up the customercount values for that account along with every child in every branch below the account along the hierarchy.
I successfully calculated the field using a hierarchy field built within the CTE, which looks like acct4.acct3.acct2.acct1. The problem I'm running into is simply making it run fast. Without this one calculated field, the query runs in ~3 seconds. When I add in the calculated field, it turns into a 4 minute query.
Here is the best version I've been able to come up with that returns the correct results. I'm looking for ideas on how I can restructure this AS A VIEW without such huge sacrifices to performance.
I understand the reason this one goes slow (requires calculating a predicate in the where clause), but I can't think of another way to structure it and still get same results.
Here's some sample code to build a table and do the CTE pretty much exactly as it works in my environment.
```
Use Tempdb
go
CREATE TABLE dbo.Account
(
Acctid varchar(1) NOT NULL
, Name varchar(30) NULL
, ParentId varchar(1) NULL
, CustomerCount int NULL
);
INSERT Account
SELECT 'A','Best Bet',NULL,21 UNION ALL
SELECT 'B','eStore','A',30 UNION ALL
SELECT 'C','Big Bens','B',75 UNION ALL
SELECT 'D','Mr. Jimbo','B',50 UNION ALL
SELECT 'E','Dr. John','C',100 UNION ALL
SELECT 'F','Brick','A',222 UNION ALL
SE
Solution
Edit: this is second attempt
Based on @Hannah Vernon's answer, here is a way to bypass the use of CTE inside an inline subquery, which is like self-joining the CTE and I presume is the reason for the poor efficiency. It uses analytic functions available only in the 2012 version of SQL-Server. Tested at SQL-Fiddle
This part can be skipped from reading, it's a copy-paste from Hannah's answer:
Here we order the rows of the CTE using the
Then we have one more intermediate CTE where we use the previous running totals and row numbers - basically to find where the end points for the branches of the tree structure:
```
, cte2 AS
(
SELECT
cte1.*
, rn3 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn2 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn1 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc3 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc2 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc1 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOW
Based on @Hannah Vernon's answer, here is a way to bypass the use of CTE inside an inline subquery, which is like self-joining the CTE and I presume is the reason for the poor efficiency. It uses analytic functions available only in the 2012 version of SQL-Server. Tested at SQL-Fiddle
This part can be skipped from reading, it's a copy-paste from Hannah's answer:
;With AccountHierarchy AS
(
SELECT
Root.AcctId AccountId
, Root.Name AccountName
, Root.ParentId ParentId
, 1 HierarchyLevel
, cast(Root.Acctid as varchar(4000)) IdHierarchyMatch
, cast(Root.Acctid as varchar(4000)) IdHierarchy
, cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy
, cast(Root.Acctid as varchar(4000)) HierarchySort
, cast(Root.Name as varchar(4000)) HierarchyLabel ,
Root.CustomerCount CustomerCount
FROM
account Root
WHERE
Root.ParentID is null
UNION ALL
SELECT
Recurse.Acctid AccountId
, Recurse.Name AccountName
, Recurse.ParentId ParentId
, Root.HierarchyLevel + 1 HierarchyLevel
, CAST(CAST(Root.IdHierarchyMatch as varchar(40)) + '.'
+ cast(recurse.Acctid as varchar(40)) as varchar(4000)) IdHierarchyMatch
, cast(cast(recurse.Acctid as varchar(40)) + '.'
+ Root.IdHierarchy as varchar(4000)) IdHierarchy
, cast(replace(recurse.Name,'.','') + '.'
+ Root.NameHierarchy as varchar(4000)) NameHierarchy
, cast(Root.AccountName + '.'
+ Recurse.Name as varchar(4000)) HierarchySort
, cast(space(root.HierarchyLevel * 4)
+ Recurse.Name as varchar(4000)) HierarchyLabel
, Recurse.CustomerCount CustomerCount
FROM
account Recurse INNER JOIN
AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)Here we order the rows of the CTE using the
IdHierarchyMatch and we calculate row numbers and a running total (from the next row up to the end.), cte1 AS
(
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, h.IdHierarchy
, h.NameHierarchy
, h.HierarchyLabel
, parsename(h.IdHierarchy,1) Acct1Id
, parsename(h.NameHierarchy,1) Acct1Name
, parsename(h.IdHierarchy,2) Acct2Id
, parsename(h.NameHierarchy,2) Acct2Name
, parsename(h.IdHierarchy,3) Acct3Id
, parsename(h.NameHierarchy,3) Acct3Name
, parsename(h.IdHierarchy,4) Acct4Id
, parsename(h.NameHierarchy,4) Acct4Name
, h.CustomerCount
, h.HierarchySort
, h.IdHierarchyMatch
, Rn = ROW_NUMBER() OVER
(ORDER BY h.IdHierarchyMatch)
, RunningCustomerCount = COALESCE(
SUM(h.CustomerCount)
OVER
(ORDER BY h.IdHierarchyMatch
ROWS BETWEEN 1 FOLLOWING
AND UNBOUNDED FOLLOWING)
, 0)
FROM
AccountHierarchy AS h
)Then we have one more intermediate CTE where we use the previous running totals and row numbers - basically to find where the end points for the branches of the tree structure:
```
, cte2 AS
(
SELECT
cte1.*
, rn3 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn2 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn1 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc3 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc2 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc1 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOW
Code Snippets
;With AccountHierarchy AS
(
SELECT
Root.AcctId AccountId
, Root.Name AccountName
, Root.ParentId ParentId
, 1 HierarchyLevel
, cast(Root.Acctid as varchar(4000)) IdHierarchyMatch
, cast(Root.Acctid as varchar(4000)) IdHierarchy
, cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy
, cast(Root.Acctid as varchar(4000)) HierarchySort
, cast(Root.Name as varchar(4000)) HierarchyLabel ,
Root.CustomerCount CustomerCount
FROM
account Root
WHERE
Root.ParentID is null
UNION ALL
SELECT
Recurse.Acctid AccountId
, Recurse.Name AccountName
, Recurse.ParentId ParentId
, Root.HierarchyLevel + 1 HierarchyLevel
, CAST(CAST(Root.IdHierarchyMatch as varchar(40)) + '.'
+ cast(recurse.Acctid as varchar(40)) as varchar(4000)) IdHierarchyMatch
, cast(cast(recurse.Acctid as varchar(40)) + '.'
+ Root.IdHierarchy as varchar(4000)) IdHierarchy
, cast(replace(recurse.Name,'.','') + '.'
+ Root.NameHierarchy as varchar(4000)) NameHierarchy
, cast(Root.AccountName + '.'
+ Recurse.Name as varchar(4000)) HierarchySort
, cast(space(root.HierarchyLevel * 4)
+ Recurse.Name as varchar(4000)) HierarchyLabel
, Recurse.CustomerCount CustomerCount
FROM
account Recurse INNER JOIN
AccountHierarchy Root on Root.AccountId = Recurse.ParentId
), cte1 AS
(
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, h.IdHierarchy
, h.NameHierarchy
, h.HierarchyLabel
, parsename(h.IdHierarchy,1) Acct1Id
, parsename(h.NameHierarchy,1) Acct1Name
, parsename(h.IdHierarchy,2) Acct2Id
, parsename(h.NameHierarchy,2) Acct2Name
, parsename(h.IdHierarchy,3) Acct3Id
, parsename(h.NameHierarchy,3) Acct3Name
, parsename(h.IdHierarchy,4) Acct4Id
, parsename(h.NameHierarchy,4) Acct4Name
, h.CustomerCount
, h.HierarchySort
, h.IdHierarchyMatch
, Rn = ROW_NUMBER() OVER
(ORDER BY h.IdHierarchyMatch)
, RunningCustomerCount = COALESCE(
SUM(h.CustomerCount)
OVER
(ORDER BY h.IdHierarchyMatch
ROWS BETWEEN 1 FOLLOWING
AND UNBOUNDED FOLLOWING)
, 0)
FROM
AccountHierarchy AS h
), cte2 AS
(
SELECT
cte1.*
, rn3 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn2 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn1 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc3 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc2 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc1 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM
cte1
)SELECT
hier.AccountId
, hier.AccountName
--- -- columns skipped
, hier.CustomerCount
, TotalCustomerCount = hier.CustomerCount
+ hier.RunningCustomerCount
- ca.LastRunningCustomerCount
, hier.HierarchySort
, hier.IdHierarchyMatch
FROM
cte2 hier
OUTER APPLY
( SELECT LastRunningCustomerCount, Rn
FROM
( SELECT LastRunningCustomerCount
= RunningCustomerCount, Rn
FROM (SELECT NULL a) x WHERE 4 <= HierarchyLevel
UNION ALL
SELECT rcc3, Rn3
FROM (SELECT NULL a) x WHERE 3 <= HierarchyLevel
UNION ALL
SELECT rcc2, Rn2
FROM (SELECT NULL a) x WHERE 2 <= HierarchyLevel
UNION ALL
SELECT rcc1, Rn1
FROM (SELECT NULL a) x WHERE 1 <= HierarchyLevel
) x
ORDER BY Rn
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY
) ca
ORDER BY
hier.HierarchySort ;SELECT
hier.AccountId
--- -- skipping rows
, hier.CustomerCount
, TotalCustomerCount = CustomerCount
+ RunningCustomerCount
- CASE HierarchyLevel
WHEN 4 THEN RunningCustomerCount
WHEN 3 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
WHEN 2 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
WHEN 1 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
END
, hier.HierarchySort
, hier.IdHierarchyMatch
FROM cte1 AS hier
ORDER BY
hier.HierarchySort ;Context
StackExchange Database Administrators Q#53348, answer score: 6
Revisions (0)
No revisions yet.