HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

Estimated vs Actual rows differences (actual much smaller than estimated) - sort

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsmuchactualthandifferencessmallerestimatedsort

Problem

I'm running a query that is processing some nodes out of an XML document. My estimated subtree cost is in the millions and it seems it all comes from a sort operation sql server is performing on some data that I extract from xml columns via XPath. The Sort operation has an estimated number of rows to be around 19 million, whereas the actual row count is about 800. The query itself runs reasonably well (1 - 2 seconds), but the discrepancy has me wondering about the query performance and why this difference is so large?

Solution

There is no statistics generated on XML columns. The estimates is guessed based on the expressions used when querying the XML.

Using this table:

create table T(XMLCol xml not null)
insert into T values('')


And this rather simple XML query:

select X.N.value('@value', 'int')
from T
  cross apply T.XMLCol.nodes('root/item') as X(N)


Will give you one row returned but the estimated rows returned is 200. It will be 200 regardless of what XML or how much XML you stuff into the XML column for that one row.

This is the query plan with the estimated row count displayed.



A way to improve, or at least change, the estimates is to give the query optimizer some more information about the XML. In this case, because I know that root really is a root node in the XML, I can rewrite the query like this.

select X2.N.value('@value', 'int')
from T
  cross apply T.XMLCol.nodes('root[1]') as X1(N)
  cross apply X1.N.nodes('item') X2(N)


That will give me an estimate of 5 rows returned.

The rewrite of the query will probably not speed up the shredding of the XML but if the estimates are better, chances are that the query optimizer can make smarter decisions for the rest of the query.

I have not found any documentation on what the rules are for the estimates other than a presentation by Michael Rys where he says:


Base cardinality estimate is always 10’000 rows!

Some adjustment based on pushed path filters

Code Snippets

create table T(XMLCol xml not null)
insert into T values('<root><item value = "1" /></root>')
select X.N.value('@value', 'int')
from T
  cross apply T.XMLCol.nodes('root/item') as X(N)
select X2.N.value('@value', 'int')
from T
  cross apply T.XMLCol.nodes('root[1]') as X1(N)
  cross apply X1.N.nodes('item') X2(N)

Context

StackExchange Database Administrators Q#44959, answer score: 10

Revisions (0)

No revisions yet.