gotchasqlMajor
Very strange performance with an XML index
Viewed 0 times
withxmlstrangeperformanceveryindex
Problem
My question is based on this: https://stackoverflow.com/q/35575990/5089204
To give an answer there I did the following test-scenario.
Test scenario
First I create a test table and fill it with 100.000 rows. A random number (0 to 1000) should lead to ~100 rows for each random number. This number is put into a varchar col and as a value into your XML.
Then I do a call like the OP there needs it with .exist() and with .nodes() with a small advantage for the second, but both take 5 to 6 seconds. In fact I do the calls twice: a second time in swapped order and with slightly changed search params and with "//item" instead of the full path to avoid false positives via cached results or plans.
Then I create an XML index and do the same calls
Now - what really did surprise me! - the
Questions:
My own tests bring up in short: XML indexes can blow up a database extremely. They can speed up things extremely (s. edit 2), but can slow your queries down too. I'd like to understand how they work... When should one create an XML index? Why can
`CREATE TABLE #testTbl(ID INT IDENTITY PRIMARY KEY, SomeData VARCHAR(100),XmlColumn XML);
GO
DECLARE @RndNumber VARCHAR(100)=(SELECT CAST(CAST(RAND()*1000 AS INT) AS VARCHAR(100)));
INSERT INTO #testTbl VALUES('Data_' + @RndNumber,
'
');
GO 100000
DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesFullPath_no_index;
GO
DECLARE @d DAT
To give an answer there I did the following test-scenario.
Test scenario
First I create a test table and fill it with 100.000 rows. A random number (0 to 1000) should lead to ~100 rows for each random number. This number is put into a varchar col and as a value into your XML.
Then I do a call like the OP there needs it with .exist() and with .nodes() with a small advantage for the second, but both take 5 to 6 seconds. In fact I do the calls twice: a second time in swapped order and with slightly changed search params and with "//item" instead of the full path to avoid false positives via cached results or plans.
Then I create an XML index and do the same calls
Now - what really did surprise me! - the
.nodes with full path is much slower than before (9 secs) but the .exist() is down to half a second, with full path even down to about 0.10 sec. (while .nodes() with short path is better, but still far behind .exist())Questions:
My own tests bring up in short: XML indexes can blow up a database extremely. They can speed up things extremely (s. edit 2), but can slow your queries down too. I'd like to understand how they work... When should one create an XML index? Why can
.nodes() with an index be worse than without? How could one avoid the negativ impact?`CREATE TABLE #testTbl(ID INT IDENTITY PRIMARY KEY, SomeData VARCHAR(100),XmlColumn XML);
GO
DECLARE @RndNumber VARCHAR(100)=(SELECT CAST(CAST(RAND()*1000 AS INT) AS VARCHAR(100)));
INSERT INTO #testTbl VALUES('Data_' + @RndNumber,
'
');
GO 100000
DECLARE @d DATETIME=GETDATE()
SELECT #testTbl.*
FROM #testTbl
CROSS APPLY XmlColumn.nodes('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') AS a(b);
SELECT CAST(GETDATE()-@d AS TIME) AS NodesFullPath_no_index;
GO
DECLARE @d DAT
Solution
There sure is a lot going on here so we will just have to see where this leads.
First off, the difference in timing between SQL Server 2012 and SQL Server 2014 is due to the new cardinality estimator in SQL Server 2014. You can use a trace flag in SQL Server 2014 to force the old estimator and then you will see the same timing characteristics in SQL Server 2014 as in SQL Server 2012.
Comparing
We know the data but SQL Server does not and has to build a query plan that takes that into consideration.
To make the
With a query like that there is no difference between using
For me in SQL Server 2012 the queries without the XML index take 6 seconds using the modified version of the
Using SQL Server 2014 and the new cardinalty estimator, there is no difference in these queries when using an XML index. Without using the index the queries still take the same amount of time but it is 15 seconds. Clearly not an improvement here when using new stuff.
Not sure if I completely lost track of what your question is actually about since I modified the queries to be equivalent but here is what I believe it is now.
Why is the
slower then when an index is not used?
Well, the answer is that SQL Server query plan optimizer does something bad and that is introducing a spool operator. I don't know why but the good news is that it is not there anymore with the new cardinalty estimator in SQL Server 2014.
With no indexes in place the query takes about 7 seconds no matter what cardinality estimator is used. With the index it takes 15 seconds with the old estimator (SQL Server 2012) and about 2 seconds with the new estimator (SQL Server 2014).
Note: The findings above are valid with your test data. There can be a whole different story to tell if you change the size, shape or form of the XML. No way to know for sure without testing with the data you actually have in the tables.
How the XML indexes work
XML indexes in SQL Server are implemented as internal tables. The primary XML index creates the table with the primary key of the base table plus node id column, in total 12 columns. It will have one row per
Secondary XML indexes come in three types. When you create a secondary XML index, there is a non-clustered index created on the internal table and, depending on what type of secondary index you create, it will have different columns and column orders.
From CREATE XML INDEX (Transact-SQL):
VALUE
Creates a secondary XML index on columns where key columns are
(node value and path) of the primary XML index.
PATH
Creates a secondary XML index on columns built on path values
and node values in the primary XML index. In the PATH secondary index,
the path and node values are key columns that allow efficient seeks
when searching for paths.
PROPERTY
Creates a secondary XML index on columns (PK, path and node
value) of the primary XML index where PK is the primary key of the
base table.
So when you create a PATH index, the first column in that index is the path expression and the second column is the value in that node. Actually, the path is stored in a kind of compresse
First off, the difference in timing between SQL Server 2012 and SQL Server 2014 is due to the new cardinality estimator in SQL Server 2014. You can use a trace flag in SQL Server 2014 to force the old estimator and then you will see the same timing characteristics in SQL Server 2014 as in SQL Server 2012.
Comparing
nodes() vs exist() is not fair since they will not return the same result if there are more than one matched element in the XML for one row. exist() will return one row from the base table regardless, whereas nodes() can potentially give you more than one row returned for each row in the base table.We know the data but SQL Server does not and has to build a query plan that takes that into consideration.
To make the
nodes() query equivalent to the exist() query, you could do something like this.SELECT testTbl.*
FROM testTbl
WHERE EXISTS (
SELECT *
FROM XmlColumn.nodes('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') AS a(b)
)With a query like that there is no difference between using
nodes() or exist() and that is because SQL Server builds almost the same plan for the two versions not using an index and exactly the same plan when index is used. That is true both for SQL Server 2012 and SQL Server 2014.For me in SQL Server 2012 the queries without the XML index take 6 seconds using the modified version of the
nodes() query above. There is no difference between using the full path or the short path. With the XML index in place the full path version is the fastest and takes 5 ms and using the short path takes about 500 ms. Examining the query plans will tell you why there is a difference but the short version is that when you use a short path, SQL Server seeks in the index on the short path (a range seek using like) and returns 700000 rows before discarding the rows that do not match on the value. When using the full path, SQL Server can use the path expression directly together with the value of the node to do the seek and returns only 105 rows from scratch to work on.Using SQL Server 2014 and the new cardinalty estimator, there is no difference in these queries when using an XML index. Without using the index the queries still take the same amount of time but it is 15 seconds. Clearly not an improvement here when using new stuff.
Not sure if I completely lost track of what your question is actually about since I modified the queries to be equivalent but here is what I believe it is now.
Why is the
nodes() query (original version) with an XML index in place significantlyslower then when an index is not used?
Well, the answer is that SQL Server query plan optimizer does something bad and that is introducing a spool operator. I don't know why but the good news is that it is not there anymore with the new cardinalty estimator in SQL Server 2014.
With no indexes in place the query takes about 7 seconds no matter what cardinality estimator is used. With the index it takes 15 seconds with the old estimator (SQL Server 2012) and about 2 seconds with the new estimator (SQL Server 2014).
Note: The findings above are valid with your test data. There can be a whole different story to tell if you change the size, shape or form of the XML. No way to know for sure without testing with the data you actually have in the tables.
How the XML indexes work
XML indexes in SQL Server are implemented as internal tables. The primary XML index creates the table with the primary key of the base table plus node id column, in total 12 columns. It will have one row per
element/node/attribute etc. so that table can of course get really big depending on the size of the XML stored. With a primary XML index in place SQL Server can use the primary key of the internal table to locate XML nodes and values for each row in the base table.Secondary XML indexes come in three types. When you create a secondary XML index, there is a non-clustered index created on the internal table and, depending on what type of secondary index you create, it will have different columns and column orders.
From CREATE XML INDEX (Transact-SQL):
VALUE
Creates a secondary XML index on columns where key columns are
(node value and path) of the primary XML index.
PATH
Creates a secondary XML index on columns built on path values
and node values in the primary XML index. In the PATH secondary index,
the path and node values are key columns that allow efficient seeks
when searching for paths.
PROPERTY
Creates a secondary XML index on columns (PK, path and node
value) of the primary XML index where PK is the primary key of the
base table.
So when you create a PATH index, the first column in that index is the path expression and the second column is the value in that node. Actually, the path is stored in a kind of compresse
Code Snippets
SELECT testTbl.*
FROM testTbl
WHERE EXISTS (
SELECT *
FROM XmlColumn.nodes('/error/serverVariables/item[@name="name5" and value/@string="My test 600"]') AS a(b)
)Context
StackExchange Database Administrators Q#130969, answer score: 33
Revisions (0)
No revisions yet.