patternModerate
Which datatype to store XML data in: VARCHAR(MAX) or XML
Viewed 0 times
datatypevarcharxmlstoremaxwhichdata
Problem
I am defining a schema for a new set of resources using SQL Server 2008... In this case, each record (e.g. row) will need to store XML fragments. From time to time; although not frequently; I'll need to query the XML to find element and attribute values. If left to my own devices, I would tend to use the XML data type although I’ve been led to believe this is wrought with issues. So that leads me to my questions.
Given this scenario, what factors should I be considering when trying to decide between storing XML in an XML column vs. a varchar(MAX) column
If it helps… here are some additional details:
Given this scenario, what factors should I be considering when trying to decide between storing XML in an XML column vs. a varchar(MAX) column
If it helps… here are some additional details:
- No decision has been made regarding the use of schema’s for these fragments (e.g. XSD’s)
- Sizes of the fragments will range from small to very large
- All XML will be well-formed
- Over the course of a day, there will be up to ~10,000 fragments collected with online query support needed for ~3 months
- Queries against the XML will happen throughout the day but should remain light with few concurrent queries of this type
Solution
what factors should I be considering when trying to decide between storing XML in an
The factors are:
-
The
-
Data in
-
-
-
One major benefit of the XML type is that it is stored in a highly optimized format (not
Returns:
As you can see in the example output above, adding four elements (#s 3, 4, 5, and 6) added 80 characters (hence 80 bytes if using
-
XML data can be indexed via specialized XML indexes
xml column vs. a varchar(MAX) columnThe factors are:
-
The
XML type is queryable / parseable through XQuery expressions, including being able to use FLWOR Statement and Iteration-
Data in
XML variables and columns can be modified inline using XQuery expressions via XML DML.-
XML data is stored as UTF-16 LE (Little Endian), so VARCHAR(MAX) would be a poor choice as it could result in data loss. Hence, the true decision should be between XML and NVARCHAR(MAX), given that NCHAR / NVARCHAR is also UTF-16 LE.-
XML data can be validated against an XSD / XML SCHEMA COLLECTION. No validation (outside of ensuring well-formedness) is done if no XML Schema Collection is specified, but this option is not available when using NVARCHAR(MAX).-
One major benefit of the XML type is that it is stored in a highly optimized format (not
VARBINARY(MAX) as stated in @Oleg's answer) that does not store the exact string representation that you see, but instead has a dictionary of Element and Attribute names and refers to them by their ID. It also removes whitespace. Try the following:DECLARE @Test1 XML = N'12';
DECLARE @String1 NVARCHAR(MAX) = CONVERT(NVARCHAR(MAX), @Test1);
SELECT DATALENGTH(@Test1) AS [XmlBytes],
LEN(@String1) AS [StringCharacters],
DATALENGTH(@String1) AS [StringBytes];
SET @Test1 = N'123
456';
SET @String1 = CONVERT(NVARCHAR(MAX), @Test1);
SELECT DATALENGTH(@Test1) AS [XmlBytes],
LEN(@String1) AS [StringCharacters],
DATALENGTH(@String1) AS [StringBytes];
Returns:
XmlBytes StringCharacters StringBytes
56 53 106
XmlBytes StringCharacters StringBytes
84 133 266
As you can see in the example output above, adding four elements (#s 3, 4, 5, and 6) added 80 characters (hence 80 bytes if using
VARCHAR) and 160 bytes to the NVARCHAR variable. Yet, it only added 28 bytes to the XML variable, which is less than it added for VARCHAR (just in case someone was going to argue in favor of VARCHAR over XML because XML is UTF-16 which is [mostly] double-byte). This optimization can save tons of space, and is reason enough by itself to use the XML datatype.-
XML data can be indexed via specialized XML indexes
Context
StackExchange Database Administrators Q#11341, answer score: 19
Revisions (0)
No revisions yet.