patternsqlMinor
Xml index, slow count of rows
Viewed 0 times
indexrowsslowxmlcount
Problem
I am not getting index seek in the following scenario.
Instead of inserting
Then add index
And secondary index
Then do a count
Notice that it doesnt use index seek and is "slow". Can take several seconds with millions of rows.
If i insert same values in a standard column and add an index to it and do a
i get results instantly.
All tests done on sql server 18.3.1
The question is, how can i make counting by xml as fast as counting by column?
Thanks
Instead of inserting
xxx in xml column like in this post Why is the secondary selective index not used when the where clause filters on value()?, insert 100k rows with this xml NiceText and similar amount of rows of this MoreText. Doesnt need to be 100k. Just needs to be many.Then add index
create selective xml index SIX_T on dbo.T(XMLDoc) for
(
pathXQUERY = '/SomeText' as xquery 'xs:string' maxlength(8) singleton
);And secondary index
create xml index SIX_T_pathXQUERY on dbo.T(XMLDoc)
using xml index SIX_T for (pathXQUERY);Then do a count
select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1;Notice that it doesnt use index seek and is "slow". Can take several seconds with millions of rows.
If i insert same values in a standard column and add an index to it and do a
select count(id)
from dbo.T as T where SomeTextColumn = 'MoreText'i get results instantly.
All tests done on sql server 18.3.1
The question is, how can i make counting by xml as fast as counting by column?
Thanks
Solution
Data differences
With a low amount of records to seek on, the optimizer is able to use the
and filter on
An interesting part here is that it executes the key lookup to get the path_1_id values that are not null.
Since that is a filter definition on the nonclustered xml index...
...while not being present in the index itself.
As a result, for the optimizer to consider using the index it knows that it has to complete these steps:
table
internal table and filter on
not included in the secondary index
Tipping points
When the expected rows to be returned are higher, it favours using the selective clustered XML Index, to be able to run a merge join & no key lookup instead:
with a residual predicate:
To filter on the xml column and
Comparing the plans
You could (but you shouldn't) use the
With the execution time =
And the execution time for the scan + merge join plan:
In short, I believe that the scan with residual predicate + merge join choice was the right choice by the optimizer.
No way around it
While I might be wrong, I don't think there is a way to improve the count query with regular XML Indexes.
We also cannot change these internal tables or even query them:
It even gives a missing index hint on the merge join query plan, which you cannot create:
To improve the queries, you would have to look into non-xml index solutions.
Edit
How do i use USE PLAN hint? You have example code for this query?
I would advise against this, but for testing purposes it would be fine.
Step 1: You would have to get the actual execution plan of the query and get
the xml:
If you don't have the query with the low estimates, enter a value that does not exist like:
Step 2: Replace all
Step 3: Paste the plan between
Step 4: I had to change utf-16 to utf-8 to get the use plan hint to work
From:
To:
Step 5: Run the query.
My query now looks like this:
```
select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN
'
With a low amount of records to seek on, the optimizer is able to use the
SIX_T_pathXQUERY index:and filter on
moretext with a seek predicate:An interesting part here is that it executes the key lookup to get the path_1_id values that are not null.
Since that is a filter definition on the nonclustered xml index...
...while not being present in the index itself.
As a result, for the optimizer to consider using the index it knows that it has to complete these steps:
- Filter on the XML value with the secondary index on the internal
table
- Match these returned values with the
SIX_Tclustered index on the
internal table and filter on
path_1_id is not null since path_1_id isnot included in the secondary index
- Match these values with the actual table,
dbo.Tto return the ID to count on
Tipping points
When the expected rows to be returned are higher, it favours using the selective clustered XML Index, to be able to run a merge join & no key lookup instead:
with a residual predicate:
To filter on the xml column and
path_1_idComparing the plans
You could (but you shouldn't) use the
USE PLAN hint to force the plan with the seek on the XML column and see what would happen if we where to search on these values. With the execution time =
CPU time = 218 ms, elapsed time = 215 ms.And the execution time for the scan + merge join plan:
CPU time = 62 ms, elapsed time = 58 ms.In short, I believe that the scan with residual predicate + merge join choice was the right choice by the optimizer.
No way around it
While I might be wrong, I don't think there is a way to improve the count query with regular XML Indexes.
We also cannot change these internal tables or even query them:
SELECT * FROM
[sys].[xml_sxi_table_1463676262_256000];It even gives a missing index hint on the merge join query plan, which you cannot create:
CREATE NONCLUSTERED INDEX []
ON [sys].[xml_sxi_table_1463676262_256000] ([pathXQUERY_1_value])
INCLUDE ([path_1_id])To improve the queries, you would have to look into non-xml index solutions.
Edit
How do i use USE PLAN hint? You have example code for this query?
I would advise against this, but for testing purposes it would be fine.
Step 1: You would have to get the actual execution plan of the query and get
the xml:
If you don't have the query with the low estimates, enter a value that does not exist like:
select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "bbb"]') = 1Step 2: Replace all
's with ''s in the execution plan xml, we will need this later. Step 3: Paste the plan between
OPTION( USE PLAN '')SELECT count(*)
FROM dbo.T as T
WHERE T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN
'');Step 4: I had to change utf-16 to utf-8 to get the use plan hint to work
From:
OPTION(USE PLAN
'To:
OPTION(USE PLAN
'Step 5: Run the query.
My query now looks like this:
```
select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN
'
Code Snippets
CPU time = 218 ms, elapsed time = 215 ms.CPU time = 62 ms, elapsed time = 58 ms.SELECT * FROM
[sys].[xml_sxi_table_1463676262_256000];CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [sys].[xml_sxi_table_1463676262_256000] ([pathXQUERY_1_value])
INCLUDE ([path_1_id])select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "bbb"]') = 1Context
StackExchange Database Administrators Q#252047, answer score: 7
Revisions (0)
No revisions yet.