HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Xml index, slow count of rows

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexrowsslowxmlcount

Problem

I am not getting index seek in the following scenario.
Instead of inserting xxx in xml column like in this post Why is the secondary selective index not used when the where clause filters on value()?, insert 100k rows with this xml NiceText and similar amount of rows of this MoreText. Doesnt need to be 100k. Just needs to be many.
Then add index

create selective xml index SIX_T on dbo.T(XMLDoc) for
(
    pathXQUERY = '/SomeText' as xquery 'xs:string' maxlength(8) singleton
);


And secondary index

create xml index SIX_T_pathXQUERY on dbo.T(XMLDoc)
  using xml index SIX_T for (pathXQUERY);


Then do a count

select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1;


Notice that it doesnt use index seek and is "slow". Can take several seconds with millions of rows.
If i insert same values in a standard column and add an index to it and do a

select count(id)
    from dbo.T as T where SomeTextColumn = 'MoreText'


i get results instantly.
All tests done on sql server 18.3.1

The question is, how can i make counting by xml as fast as counting by column?

Thanks

Solution

Data differences

With a low amount of records to seek on, the optimizer is able to use the SIX_T_pathXQUERY index:

and filter on moretext with a seek predicate:

An interesting part here is that it executes the key lookup to get the path_1_id values that are not null.

Since that is a filter definition on the nonclustered xml index...

...while not being present in the index itself.

As a result, for the optimizer to consider using the index it knows that it has to complete these steps:

  • Filter on the XML value with the secondary index on the internal


table

  • Match these returned values with the SIX_T clustered index on the


internal table and filter on path_1_id is not null since path_1_id is
not included in the secondary index

  • Match these values with the actual table, dbo.T to return the ID to count on



Tipping points

When the expected rows to be returned are higher, it favours using the selective clustered XML Index, to be able to run a merge join & no key lookup instead:

with a residual predicate:

To filter on the xml column and path_1_id

Comparing the plans

You could (but you shouldn't) use the USE PLAN hint to force the plan with the seek on the XML column and see what would happen if we where to search on these values.

With the execution time =

CPU time = 218 ms,  elapsed time = 215 ms.


And the execution time for the scan + merge join plan:

CPU time = 62 ms,  elapsed time = 58 ms.


In short, I believe that the scan with residual predicate + merge join choice was the right choice by the optimizer.

No way around it

While I might be wrong, I don't think there is a way to improve the count query with regular XML Indexes.
We also cannot change these internal tables or even query them:

SELECT * FROM 
[sys].[xml_sxi_table_1463676262_256000];


It even gives a missing index hint on the merge join query plan, which you cannot create:

CREATE NONCLUSTERED INDEX []
ON [sys].[xml_sxi_table_1463676262_256000] ([pathXQUERY_1_value])
INCLUDE ([path_1_id])


To improve the queries, you would have to look into non-xml index solutions.

Edit


How do i use USE PLAN hint? You have example code for this query?

I would advise against this, but for testing purposes it would be fine.

Step 1: You would have to get the actual execution plan of the query and get
the xml:

If you don't have the query with the low estimates, enter a value that does not exist like:

select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "bbb"]') = 1


Step 2: Replace all 's with ''s in the execution plan xml, we will need this later.

Step 3: Paste the plan between OPTION( USE PLAN '')

SELECT count(*)
FROM dbo.T as T
WHERE T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN 
'');


Step 4: I had to change utf-16 to utf-8 to get the use plan hint to work

From:

OPTION(USE PLAN
'


To:

OPTION(USE PLAN
'


Step 5: Run the query.

My query now looks like this:

```
select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN
'

















































































Code Snippets

CPU time = 218 ms,  elapsed time = 215 ms.
CPU time = 62 ms,  elapsed time = 58 ms.
SELECT * FROM 
[sys].[xml_sxi_table_1463676262_256000];
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [sys].[xml_sxi_table_1463676262_256000] ([pathXQUERY_1_value])
INCLUDE ([path_1_id])
select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "bbb"]') = 1

Context

StackExchange Database Administrators Q#252047, answer score: 7

Revisions (0)

No revisions yet.