HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

SQL query to compare two XML columns

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
columnssqlqueryxmltwocompare

Problem

I need to compare two XML columns. I currently have about 1000 rows of data that this is running against and it takes about 20min. Is there anyway I can improve this query to run faster.

SELECT
  MAX(T.CaseName) AS CaseName,
  T.PartNumber,
  T.NodeName,
  T.OldValue,
  T.NewValue
FROM (SELECT
  C.CaseName,
  old.N.value('../../../MaterialName[1]', 'nvarchar(100)') AS PartNumber,
  old.N.value('Name[1]', 'nvarchar(4000)') AS NodeName,
  old.N.value('Value[1]', 'nvarchar(4000)') AS OldValue,
  new.N.value('Value[1]', 'nvarchar(4000)') AS NewValue
FROM Cases c
CROSS APPLY BomDataCase.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial/BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') AS old (N)
CROSS APPLY BomDataChange.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial/BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') AS new (N)
INNER JOIN LibraryStatuses ls
  ON c.[Status] = ls.StatusId
  AND c.LibraryId = ls.LibraryId
WHERE old.N.value('../../../MaterialName[1]', 'nvarchar(100)') = new.N.value('../../../MaterialName[1]', 'nvarchar(100)')
AND old.n.value('Name[1]', 'nvarchar(100)') = new.n.value('Name[1]', 'nvarchar(100)')
AND old.n.value('Value[1]', 'nvarchar(100)') <> new.n.value('Value[1]', 'nvarchar(100)')
AND ls.name = 'Review') AS T
GROUP BY T.PartNumber,
         T.NodeName,
         T.OldValue,
         T.NewValue


Each xml string is fairly large about 1000 to 1500 lines.

This is the Cases table structure.

```
CREATE TABLE [dbo].Cases NOT NULL,
[LibraryId] [int] NOT NULL,
[CaseName] nvarchar NULL,
[ConfigId] nvarchar NOT NULL,
[CurrentConfigId] nvarchar NULL,
[PartNumber] nvarchar NOT NULL,
[Image] nvarchar NULL,
[Status] [int] NULL,
[Price] decimal NULL,
[Comments] nvarchar NULL,
[Error] nvarchar

Solution

Try this query instead on your data and tell how it performs. Not sure it will return exactly the same as your query but it should be close enough so you can adjust to what you want.

select C.CaseName,
O.PartNumber,
O.NodeName,
O.Value as OldValue,
N.Value as NewValue
from dbo.Cases as C
cross apply (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataCase.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial') as M(X)
cross apply M.X.nodes('BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') as A(X)
) as O
cross apply (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataChange.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial') as M(X)
cross apply M.X.nodes('BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') as A(X)
) as N
where O.PartNumber = N.PartNumber and
O.NodeName = N.NodeName and
O.Value <> N.Value;


The parts that should make a difference on performance is that I have replaced the use of the parent axis ../ with an extra call to cross apply nodes() and I specify the text() node in the values() function.

Update:

Using 1000 rows and the XML provided in the question for both XML columns I timed the above query to about 4 minutes on my machine. The execution plan has two branches responsible for shredding each of the two XML columns. The bad thing with this plan is that the second branch is executed once for each row coming out of the first branch. It would be much better if the shredding of the columns could be done independently and then joined together.

So I tried this rewrite of the query.

select C.CaseName,
T.PartNumber,
T.NodeName,
T.OldValue,
T.NewValue
from dbo.Cases as C
cross apply (
select O.PartNumber,
O.NodeName,
O.Value as OldValue,
N.Value as NewValue
from (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataCase.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial') as M(X)
cross apply M.X.nodes('BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') as A(X)
) as O
inner join (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataChange.nodes('/TopBomComponents/TopBomComponents/TopBomComponent/BomMaterials/BomMaterial') as M(X)
cross apply M.X.nodes('BomMaterialAttributes/CustomBomMaterialAttributes/CustomBomMaterialAttribute') as A(X)
) as N
on O.PartNumber = N.PartNumber and
O.NodeName = N.NodeName
where O.Value <> N.Value
) as T;


And it worked quite good. Execution time dropped to 62 seconds. The shredding of the XML is done only once per XML column and the result of the second branch is stored in a table spool and that spool is used for all rows coming out of the first branch.

How about if we could make the branches join together with a merge join instead. The input to a merge join must be sorted so we would expect to see two extra sorts in the execution plan.

Same query as above with a merge join hint.

`select C.CaseName,
T.PartNumber,
T.NodeName,
T.OldValue,
T.NewValue
from dbo.Cases as C
cross apply (
select O.PartNumber,
O.NodeName,
O.Value as OldValue,
N.Value as NewValue
from (
select M.X.value('(MaterialName/text())[1]', 'nvarchar(4000)') as PartNumber,
A.X.value('(Name/text())[1]', 'nvarchar(4000)') as NodeName,
A.X.value('(Value/text())[1]', 'nvarchar(4000)') as Value
from C.BomDataCase.nodes('/TopBomCompone

Context

StackExchange Code Review Q#141659, answer score: 9

Revisions (0)

No revisions yet.