HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

TSQL (2014) - Importing XML with accented and punctuation Characters

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
accentedtsqlpunctuationwithxmlcharactersandimporting2014

Problem

Using this solution from Forrest:

DECLARE @XML xml = 
'
   
    
        334
        17.10
        marc58
        
            Marc Juppé
            Rue  garçonneé III° arrondissement
            Paris
            
            FR
            333333333
            22222
         
         4.50
         
            
               
                    3664
                    47
                    MCPU DDA010
                    mmx
                
                1
                6.2
            
            
               
                    3665
                    45
                    MCPU DFZ42
                    mmy
                
                2
                3.2
            
        
    
'

SELECT 
    x.value('./ItemID[1]','int') AS ItemID,
    x.value('./Store[1]','int') AS Store,
    x.value('./Title[1]','nvarchar(100)') AS Title,
    x.value('./SKU[1]','nvarchar(100)') AS SKU,
    x.value('../Quantity[1]','int') AS Qty,
    x.value('../Price[1]','decimal(11,2)') AS Price,
    x.query('//OrderID[1]').value('.','int') AS OrderID,
    x.query('//AmountPaid[1]').value('.','decimal(11,2)') AS AmountPaid,
    x.query('//UserID[1]').value('.','nvarchar(100)') AS UserID,
    x.query('//ShippingCosts[1]').value('.','decimal(11,2)') AS ShippingCosts
FROM @XML.nodes('//Item/Details') i(x)


I receive an error like


XML parsing: line 162, character 34, illegal xml character

when need to parse fields containing accented characters (àèòìùé) or °

I tried converting é int è but get this error:


XML parsing: line 162, character 41, well formed check: undeclared entity

while if I convert to è it works.

The problem is I cannot HTMLEncode the XML File since I expect it will convert also all the "" of the XML Structure, and probably also something else.

Can suggest the best solution to HTMLEncode the XML before parsing?

Solution

The XML you have is invalid in UTF-8 encoding. The accented characters needs to be encoded. For instance ° should be encoded as °.

Here is a shorter version that also fails.

declare @X xml = '°';
select @X.value('text()[1]', 'nchar(1)');



Msg 9420, Level 16, State 1, Line 1 XML parsing: line 1, character 39,
illegal xml character

Should be like this instead if you use UTF-8.

declare @X xml = '°';
select @X.value('text()[1]', 'nchar(1)');


You need to trace the source of the data backwards to the producer to see where you end up with invalid XML.

Code Snippets

declare @X xml = '<?xml version="1.0" encoding="UTF-8"?>°';
select @X.value('text()[1]', 'nchar(1)');
declare @X xml = '<?xml version="1.0" encoding="UTF-8"?>°';
select @X.value('text()[1]', 'nchar(1)');

Context

StackExchange Database Administrators Q#187492, answer score: 5

Revisions (0)

No revisions yet.