HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Optimisation of complex query - 100% cost index seek?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexseekqueryoptimisation100costcomplex

Problem

I'm a c# developer, however I'm quite experienced with SQL. I have had a bit of experience with optimising queries but I've come up across one that has me stumped.

The query takes about 3 minutes to run for approximately 20,000 rows.
The rows start streaming immediately but for the 3 minute period, the SQL server cpu is pegged at 100%.

In this particular case, my queries performed well until the client requested I test them with 20k rows. These 20k rows (Clients) were distributed amongst 20 Practitioners (so roughly 1000 each)

Here's my database schema for the relevant tables:

The query uses a Recursive CTE to gather, given a source practitioner, the (n-level) tree of clients below them. It then joins to some FK type tables to collect additional information.

It appears the slow-down is coming from the join to the authority table, which represents which practitioners manage which clients.



pooling.ClientAccountInheritedRateGroup, pooling.ClientAccountFirstContact and pooling.PractitionerAccountFirstContact are all views which all come back with sub-second selects for the full data set.

Right, now for the actual query (sorry, it's a bit of a monster):

```
with h (distance, id, name, parentid, [path]) as (
select
0,
pa.id,
pa.name,
null,
convert(varchar(80), ltrim(str(pa.id))) as node_seq
from
pooling.practitioneraccount pa
left outer join pooling.practitioneraccountrelation as par
on pa.id = par.childid
where
pa.id = @PractitionerAccountId
or (@PractitionerAccountId is null and par.parentid is null)

union all

select
1,
-1,
'Unmanaged Clients',
-1,
'-1'
where @practitioneraccountid is null

union all

select
distance + 1,
pa.id,
pa.name,
par.parentid,
convert(varchar(80), h.path + ',' + ltrim(str(pa.id)))
from
pooling.practitioneraccount pa
inner join pooling.practitioneraccountrelation as par
on pa.id = par.childid
inner join h on

Solution

Here is your description of the problem: "The CTE portion of the query returns almost instantly, only when we start joining to the clients does the performance issue appear".

One possible, and likely, explanation is as follows: the optimizer fails to estimate the cardinality of your subtree, and chooses an inefficient plan. With your way of storing hierarchies, this is no surprise. How would you yourself estimate the size of a subtree without actually retrieving it?

Can you use materialized path? Getting a subtree using materialized path is essentially one range scan, fast and simple, and the optimizer can have a good cardinality estimate off the statistics on one index.

In my experience, your way of storing/reading hierarchies does not scale up. I have never been able to make it work fast and use resources efficiently.

Context

StackExchange Database Administrators Q#27140, answer score: 2

Revisions (0)

No revisions yet.