HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

XML splitting query very slow

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
querysplittingslowxmlvery

Problem

I have a query to split a delimited string into to multiple columns. The query is extremely slow and takes more than 5 minutes to handle 80444 rows. How can I improve execution time?

WITH cte (Execution_ID,Intake_Generic_ID, [File_Name],F1,Report_Attributes) AS ( SELECT 
    Execution_ID,  Intake_Generic_ID,
    [File_Name],  F1,
    CONVERT(XML,'' 
        + REPLACE(F1,'~', '') 
        + '') AS Report_Attributes FROM [dbo].[Intake_Generic] )

SELECT 
    Execution_ID,  Intake_Generic_ID,
    [File_Name],
    LTRIM(RTRIM(Report.value('Attribute[1]','varchar(50)'))) AS [TransactionDate],
    LTRIM(RTRIM(Report.value('Attribute[2]','varchar(50)'))) AS [TransactionNum],
    LTRIM(RTRIM(Report.value('Attribute[3]','varchar(50)'))) AS [Interchange],
    LTRIM(RTRIM(Report.value('Attribute[4]','varchar(50)'))) AS [Gantry],  
    LTRIM(RTRIM(Report.value('Attribute[6]','varchar(50)'))) AS [GantryDirection],  
    LTRIM(RTRIM(Report.value('Attribute[7]','varchar(50)'))) AS [GantryEntryExit],  
    LTRIM(RTRIM(Report.value('Attribute[8]','varchar(50)'))) AS [RTCID],  
    LTRIM(RTRIM(Report.value('Attribute[9]','varchar(100)'))) AS [Vehicle_Classification_Cd],  
    LTRIM(RTRIM(Report.value('Attribute[10]','varchar(100)'))) AS [VehicleClassification],  
    LTRIM(RTRIM(Report.value('Attribute[11]','varchar(100)'))) AS [Txn_Iden_Cd],  
    LTRIM(RTRIM(Report.value('Attribute[12]','varchar(100)'))) AS [TransactionIdentification] FROM  cte S 
        CROSS APPLY Report_Attributes.nodes('/Report') AS Tbl(Report)  order by Intake_Generic_ID

Solution

This won't do much for the performance, but maintaining this kind of T-SQL is painful, so proper formatting is essential.

The FROM clause is hidden, and one has to pay very close attention to notice the ORDER BY clause as well. Consider:

WITH cte (Execution_ID,Intake_Generic_ID, [File_Name],F1,Report_Attributes) AS ( 
    SELECT 
         Execution_ID
        ,Intake_Generic_ID
        ,[File_Name]
        ,F1
        ,CONVERT(XML,
            '' + 
            REPLACE(F1,'~', '') + 
            '') AS Report_Attributes 
    FROM [dbo].[Intake_Generic]
)
SELECT 
     Execution_ID
    ,Intake_Generic_ID
    ,[File_Name]
    ,LTRIM(RTRIM(Report.value('Attribute[1]','varchar(50)'))) AS [TransactionDate]
    ,LTRIM(RTRIM(Report.value('Attribute[2]','varchar(50)'))) AS [TransactionNum]
    ,LTRIM(RTRIM(Report.value('Attribute[3]','varchar(50)'))) AS [Interchange]
    ,LTRIM(RTRIM(Report.value('Attribute[4]','varchar(50)'))) AS [Gantry]
    ,LTRIM(RTRIM(Report.value('Attribute[6]','varchar(50)'))) AS [GantryDirection]
    ,LTRIM(RTRIM(Report.value('Attribute[7]','varchar(50)'))) AS [GantryEntryExit]
    ,LTRIM(RTRIM(Report.value('Attribute[8]','varchar(50)'))) AS [RTCID]
    ,LTRIM(RTRIM(Report.value('Attribute[9]','varchar(100)'))) AS [Vehicle_Classification_Cd]
    ,LTRIM(RTRIM(Report.value('Attribute[10]','varchar(100)'))) AS [VehicleClassification]
    ,LTRIM(RTRIM(Report.value('Attribute[11]','varchar(100)'))) AS [Txn_Iden_Cd]
    ,LTRIM(RTRIM(Report.value('Attribute[12]','varchar(100)'))) AS [TransactionIdentification] 
FROM  cte S 
    CROSS APPLY Report_Attributes.nodes('/Report') AS Tbl(Report)
ORDER BY Intake_Generic_ID


XQuery is slower than "normal" T-SQL querying, pretty much by definition.

If the results don't need to be sorted, or if they can be sorted by whoever is consuming this data, then removing the ORDER BY clause should give you a little boost, but I suspect the CROSS APPLY is much more expensive than the sorting.

Looking more closely at what the query is doing, I'd say the XQuery isn't required for this to work - it was clever, but as you saw, a total performance killer.

,F1
    ,CONVERT(XML,
        '' + 
        REPLACE(F1,'~', '') + 
        '') AS Report_Attributes


You're building the XML from the contents of F1, by replacing occurrences of ~ within the varchar value.

If the data is shaped anything like a list of comma tilde-separated values, you could write a table-valued function that returns a table with 12 records given that F1 string:

declare @result as table (
     AttributeIndex int
    ,AttributeValue varchar(50)
);


Your CTE could be CROSS APPLY-ing that function with the contents of Intake_Generic, and the actual SELECT wouldn't have much work left to do. And without XML involved, I'm willing to bet that you'd get the results much, much faster.

Code Snippets

WITH cte (Execution_ID,Intake_Generic_ID, [File_Name],F1,Report_Attributes) AS ( 
    SELECT 
         Execution_ID
        ,Intake_Generic_ID
        ,[File_Name]
        ,F1
        ,CONVERT(XML,
            '<Report><Attribute>' + 
            REPLACE(F1,'~', '</Attribute <Attribute>') + 
            '</Attribute></Report>') AS Report_Attributes 
    FROM [dbo].[Intake_Generic]
)
SELECT 
     Execution_ID
    ,Intake_Generic_ID
    ,[File_Name]
    ,LTRIM(RTRIM(Report.value('Attribute[1]','varchar(50)'))) AS [TransactionDate]
    ,LTRIM(RTRIM(Report.value('Attribute[2]','varchar(50)'))) AS [TransactionNum]
    ,LTRIM(RTRIM(Report.value('Attribute[3]','varchar(50)'))) AS [Interchange]
    ,LTRIM(RTRIM(Report.value('Attribute[4]','varchar(50)'))) AS [Gantry]
    ,LTRIM(RTRIM(Report.value('Attribute[6]','varchar(50)'))) AS [GantryDirection]
    ,LTRIM(RTRIM(Report.value('Attribute[7]','varchar(50)'))) AS [GantryEntryExit]
    ,LTRIM(RTRIM(Report.value('Attribute[8]','varchar(50)'))) AS [RTCID]
    ,LTRIM(RTRIM(Report.value('Attribute[9]','varchar(100)'))) AS [Vehicle_Classification_Cd]
    ,LTRIM(RTRIM(Report.value('Attribute[10]','varchar(100)'))) AS [VehicleClassification]
    ,LTRIM(RTRIM(Report.value('Attribute[11]','varchar(100)'))) AS [Txn_Iden_Cd]
    ,LTRIM(RTRIM(Report.value('Attribute[12]','varchar(100)'))) AS [TransactionIdentification] 
FROM  cte S 
    CROSS APPLY Report_Attributes.nodes('/Report') AS Tbl(Report)
ORDER BY Intake_Generic_ID
,F1
    ,CONVERT(XML,
        '<Report><Attribute>' + 
        REPLACE(F1,'~', '</Attribute <Attribute>') + 
        '</Attribute></Report>') AS Report_Attributes
declare @result as table (
     AttributeIndex int
    ,AttributeValue varchar(50)
);

Context

StackExchange Code Review Q#114056, answer score: 4

Revisions (0)

No revisions yet.