principlesqlMinor
SQL query to compare two XML columns
Viewed 0 times
columnssqlqueryxmltwocompare
Problem
I need to compare two XML columns. I currently have about 1000 rows of data that this is running against and it takes about 20min. Is there anyway I can improve this query to run faster.
Each xml string is fairly large about 1000 to 1500 lines.
This is the Cases table structure.
```
CREATE TABLE [dbo].Cases NOT NULL,
[LibraryId] [int] NOT NULL,
[CaseName] nvarchar NULL,
[ConfigId] nvarchar NOT NULL,
[CurrentConfigId] nvarchar NULL,
[PartNumber] nvarchar NOT NULL,
[Image] nvarchar NULL,
[Status] [int] NULL,
[Price] decimal NULL,
[Comments] nvarchar NULL,
[Error] nvarchar
SELECT
MAX(T.CaseName) AS CaseName,
T.PartNumber,
T.NodeName,
T.OldValue,
T.NewValue
FROM (SELECT
C.CaseName,
old.N.value('../../../MaterialName[1]', 'nvarchar(100)') AS PartNumber,
old.N.value('Name[1]', 'nvarchar(4000)') AS NodeName,
old.N.value('Value[1]', 'nvarchar(4000)') AS OldValue,
new.N.value('Value[1]', 'nvarchar(4000)') AS NewValue
FROM Cases c
CROSS APPLY BomDataCase.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial/BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') AS old (N)
CROSS APPLY BomDataChange.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial/BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') AS new (N)
INNER JOIN LibraryStatuses ls
ON c.[Status] = ls.StatusId
AND c.LibraryId = ls.LibraryId
WHERE old.N.value('../../../MaterialName[1]', 'nvarchar(100)') = new.N.value('../../../MaterialName[1]', 'nvarchar(100)')
AND old.n.value('Name[1]', 'nvarchar(100)') = new.n.value('Name[1]', 'nvarchar(100)')
AND old.n.value('Value[1]', 'nvarchar(100)') <> new.n.value('Value[1]', 'nvarchar(100)')
AND ls.name = 'Review') AS T
GROUP BY T.PartNumber,
T.NodeName,
T.OldValue,
T.NewValueEach xml string is fairly large about 1000 to 1500 lines.
This is the Cases table structure.
```
CREATE TABLE [dbo].Cases NOT NULL,
[LibraryId] [int] NOT NULL,
[CaseName] nvarchar NULL,
[ConfigId] nvarchar NOT NULL,
[CurrentConfigId] nvarchar NULL,
[PartNumber] nvarchar NOT NULL,
[Image] nvarchar NULL,
[Status] [int] NULL,
[Price] decimal NULL,
[Comments] nvarchar NULL,
[Error] nvarchar
Solution
Try this query instead on your data and tell how it performs. Not sure it will return exactly the same as your query but it should be close enough so you can adjust to what you want.
The parts that should make a difference on performance is that I have replaced the use of the parent axis
Update:
Using 1000 rows and the XML provided in the question for both XML columns I timed the above query to about 4 minutes on my machine. The execution plan has two branches responsible for shredding each of the two XML columns. The bad thing with this plan is that the second branch is executed once for each row coming out of the first branch. It would be much better if the shredding of the columns could be done independently and then joined together.
So I tried this rewrite of the query.
And it worked quite good. Execution time dropped to 62 seconds. The shredding of the XML is done only once per XML column and the result of the second branch is stored in a table spool and that spool is used for all rows coming out of the first branch.
How about if we could make the branches join together with a merge join instead. The input to a merge join must be sorted so we would expect to see two extra sorts in the execution plan.
Same query as above with a merge join hint.
`select C.CaseName,
T.PartNumber,
T.NodeName,
T.OldValue,
T.NewValue
from dbo.Cases as C
cross apply (
select O.PartNumber,
O.NodeName,
O.Value as OldValue,
N.Value as NewValue
from (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataCase.nodes('/TopBomCompone
select C.CaseName,
O.PartNumber,
O.NodeName,
O.Value as OldValue,
N.Value as NewValue
from dbo.Cases as C
cross apply (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataCase.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial') as M(X)
cross apply M.X.nodes('BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') as A(X)
) as O
cross apply (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataChange.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial') as M(X)
cross apply M.X.nodes('BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') as A(X)
) as N
where O.PartNumber = N.PartNumber and
O.NodeName = N.NodeName and
O.Value <> N.Value;
The parts that should make a difference on performance is that I have replaced the use of the parent axis
../ with an extra call to cross apply nodes() and I specify the text() node in the values() function.Update:
Using 1000 rows and the XML provided in the question for both XML columns I timed the above query to about 4 minutes on my machine. The execution plan has two branches responsible for shredding each of the two XML columns. The bad thing with this plan is that the second branch is executed once for each row coming out of the first branch. It would be much better if the shredding of the columns could be done independently and then joined together.
So I tried this rewrite of the query.
select C.CaseName,
T.PartNumber,
T.NodeName,
T.OldValue,
T.NewValue
from dbo.Cases as C
cross apply (
select O.PartNumber,
O.NodeName,
O.Value as OldValue,
N.Value as NewValue
from (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataCase.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial') as M(X)
cross apply M.X.nodes('BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') as A(X)
) as O
inner join (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataChange.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial') as M(X)
cross apply M.X.nodes('BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') as A(X)
) as N
on O.PartNumber = N.PartNumber and
O.NodeName = N.NodeName
where O.Value <> N.Value
) as T;
And it worked quite good. Execution time dropped to 62 seconds. The shredding of the XML is done only once per XML column and the result of the second branch is stored in a table spool and that spool is used for all rows coming out of the first branch.
How about if we could make the branches join together with a merge join instead. The input to a merge join must be sorted so we would expect to see two extra sorts in the execution plan.
Same query as above with a merge join hint.
`select C.CaseName,
T.PartNumber,
T.NodeName,
T.OldValue,
T.NewValue
from dbo.Cases as C
cross apply (
select O.PartNumber,
O.NodeName,
O.Value as OldValue,
N.Value as NewValue
from (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataCase.nodes('/TopBomCompone
Context
StackExchange Code Review Q#141659, answer score: 9
Revisions (0)
No revisions yet.