HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

Convert Byte Array From XML to VARBINARY

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
convertbytearrayvarbinaryxmlfrom

Problem

I receive image files as XML data, each byte of the image being a node with its decimal value, e.g. for this example .png file, , the xml I get is:

DECLARE @xml XML = N'


Test

13780787113102610000137372688200020000208600014113729130004103657765001771431125297500091127289115001419300141931184145107237000241166988116831111021161199711410101129710511011646110101116325246484654252140992230001087368658456799924518419211432254141963096302119212632254151969963021192485580713629208240121326252161423661953464140231289819164153886027136241130213641224234501261519632205215145196965631367363214132371147132491282346250011351643128121163012424012760481971521022431300000736978681746696130


'


In binary:

SELECT * FROM OPENROWSET(BULK 'C:\test.png', SINGLE_BLOB) AS q;
==========
BulkColumn
----------
0x89504E470D0A1A0A0000000D49484452000000140000001408060000008D891D0D0000000467414D410000B18F0BFC6105000000097048597300000EC100000EC101B8916BED0000001874455874536F667477617265007061696E742E6E657420342E302E36FC8C63DF0000006C49444154384F631805B8C07220FE0EC41E601E15C07E20FE0FC409601E15C030375007881DD0F079200619D88E2406C322408C17806213A499583C1B88F182D5400CF222327E0FC420CDD791C4603803884906208D200347930DF980EA06FA007105102B8079A300011818007F3C30C59866F3820000000049454E44AE426082


How to I retrieve the image file from the xml as varbinary?

I've asked a similar question not long ago, so I tried the following query, but the resulting binary data is incorrect:

```
SELECT r.c.value('id[1]', 'varchar(50)') AS id,
CONVERT(VARBINARY(MAX), (SELECT (t.u.value('.','tinyint')) FROM r.c.nodes('image/Element') AS t(u) FOR XML PATH(''))) AS image
FROM @xml.nodes('/XmlData/Element') AS r(c);
=============
id image
-------------
Test 0x31003300370038003000370038003700310031003300310030003200360031003000300030003000310033003700330037003200360038003800320030003000300032003000300030003000320030003800360030003000300031003400310031003300370032

Solution

This is close but missing a few pieces. You extract into rows of TINYINT the decimal value from each ` in the XML (e.g. 137, 80, 78, etc), but then the FOR XML PATH('') converts them back into strings and concatenates them, leaving you with a UTF-16 encoded string of "1378078...". Converting that into VARBINARY just turns each string digit -- "1", "3", "7", "8", etc -- into its binary / hex Code Point:

SELECT CONVERT(VARBINARY(20), N'1378078');
-- 0x3100330037003800300037003800
-- 0x 3100 3300 3700 3800 3000 3700 3800 -- each character separated for readability

-- XML in SQL Server is encoded as UTF-16, same as NCHAR / NVARCHAR.
-- Each of these characters in UTF-16 is two bytes: 0x31 + 0x00, 0x33 + 0x00, etc.
-- Each pair of bytes is in reverse order due to "endianness". 0x3100 is really 0x0031.

SELECT NCHAR(0x0031), NCHAR(0x0033), NCHAR(0x0037), NCHAR(0x0038), NCHAR(0x0030),
NCHAR(0x0037), NCHAR(0x0038);
-- 1 3 7 8 0 7 8


Instead, you need to do the following:

  • Convert the decimal / TINYINT "137" into hex / BINARY "0x89"



  • Convert the hex/binary "0x89" into a string / VARCHAR, but without the leading "0x" (this requires using the CONVERT function, not CAST, so that you can specify the "style" of 2)



  • Once the FOR XML PATH('') puts everything together in a string in the form of 89504E470D0A1A0A0000..., then you need to apply the "style" of 2 again in the CONVERT(VARBINARY(MAX), ... so that it knows that 89504E470D0A1A0A0000... is merely 0x89504E470D0A1A0A0000... without the leading "0x".



Putting those pieces into your query, we get the following:

SELECT r.c.value('id[1]', 'varchar(50)') AS [id],
CONVERT(VARBINARY(MAX),
(SELECT CONVERT(VARCHAR(3),
CONVERT(BINARY(1),
t.u.value('.', 'tinyint')
),
2 -- style creates binary string without the leading "0x"
)
FROM r.c.nodes('image/Element') AS t(u)
FOR XML PATH('')
),
2 -- style creates binary string without the leading "0x"
) AS [image]
FROM @xml.nodes('/XmlData/Element') AS r(c);


And that returns the following for the
image field:


0x89504E470D0A1A0A0000000D49484452000000140000001408060000008D891D0D0000000467414D410000B18F0BFC6105000000097048597300000EC100000EC101B8916BED0000001874455874536F667477617265007061696E742E6E657420342E302E36FC8C63DF0000006C49444154384F631805B8C07220FE0EC41E601E15C07E20FE0FC409601E15C030375007881DD0F079200619D88E2406C322408C17806213A499583C1B88F182D5400CF222327E0FC420CDD791C4603803884906208D200347930DF980EA06FA007105102B8079A300011818007F3C30C59866F3820000000049454E44AE426082

Something to consider:

The method of sending binary data via

13780...


is probably the worst / least-efficient method possible. I realize that you said that you are receiving this info in this format and so probably are not to blame for this and have no control over it. However, just so everyone understands what is going on (please see UPDATE section below), each byte of the binary data is:

  • first turned into its decimal equivalent ( values 0 - 255 )



  • then converted into a string ( taking up 1 - 3 characters )



  • a string which is stored as UTF-16 which is 2 bytes per characters for these characters ( 2 - 6 bytes )



  • and wrapped in and tags { why not ? } ( 19 characters )



  • which, again, is stored in UTF-16 which is 2 bytes per character for these characters ( 38 bytes )



  • ultimately taking up 40 - 46 bytes ( 38 + 2 on the low-end, 38 + 6 on the high-end ) per each byte of the original binary value.



How does this work out? Well, only 10 decimal values ( 0 - 9 ) are 1 character / 2 bytes. Another 90 ( 10 - 99 ) are 2 characters / 4 bytes, while the remaining 156 values ( 100 - 255 ) are 3 characters / 6 bytes. So the majority of possible values take up the full 6 bytes, and only a small minority take up the minimum 2 bytes. This means that the average space taken up per each original byte is probably between 2 and 3 characters / 4 - 6 bytes (I guess they call that "5" in some places ;-) ? ).

For your particular example data, you can run the following query to see the breakdown:

;WITH cte AS
( SELECT LEN(CONVERT(VARCHAR(3), r.c.value('.', 'tinyint'))) AS [Length]
FROM @xml.nodes('/XmlData/Element/image/Element') AS r(c)
)
SELECT cte.[Length] AS [ElementLength], COUNT(*) AS [ElementCount]
FROM cte
GROUP BY cte.[Length];


That returns:

ElementLength ElementCount
------------- ------------
1 55
2 98
3 85


Now we can multiply each
ElementCount by (ElementLength * 2) to get the number of bytes. And we need to factor in the tags, which is again 38 bytes per each of the 238 original bytes:

SELECT (55 2) + (98 4) + (85 6) + (238 38)
--

Context

StackExchange Database Administrators Q#119884, answer score: 16

Revisions (0)

No revisions yet.