patternsqlMinor
Comparing binary 0x and 0x00 turns out to be equal on SQL Server
Viewed 0 times
comparingserverequalsqlturns0x00binaryandout
Problem
It seems that SQL Server considers 0x and 0x00 equal values:
This outputs
How can I get true binary bit-for-bit comparison behavior? Also, what are the exact rules under which two
Also note the following behavior:
Background of the question is that I'm trying to deduplicate binary data. I need to
Note, that
SELECT CASE WHEN 0x = 0x00 THEN 1 ELSE 0 ENDThis outputs
1.How can I get true binary bit-for-bit comparison behavior? Also, what are the exact rules under which two
(var)binary values are considered equal?Also note the following behavior:
--prints just one of the values
SELECT DISTINCT [Data]
FROM (VALUES (0x), (0x00), (0x0000)) x([Data])
--prints the obvious length values 1, 2 and 3
SELECT DATALENGTH([Data]) AS [DATALENGTH], LEN([Data]) AS [LEN]
FROM (VALUES (0x), (0x00), (0x0000)) x([Data])Background of the question is that I'm trying to deduplicate binary data. I need to
GROUP BY binary data, not just compare two values. I'm glad I even noticed this problem.Note, that
HASHBYTES does not support LOBs. I'd also like to find a simpler solution.Solution
I couldn't find this comparison behaviour specified anywhere in BOL.
But the Connect Item Invalid equality comparison for varbinary data with right padded zeros states that
Basically, the standard leaves it up to implementation to treat
strings that differ only by [trailing]
The Connect Item also states that the presence of trailing zeroes is the only case in which SQL Server differs from byte-by-byte comparison behavior.
In order to distinguish between two binary values in SQL Server that differ only by trailing
The reason for preferring
Though either would work in your use case.
But the Connect Item Invalid equality comparison for varbinary data with right padded zeros states that
Basically, the standard leaves it up to implementation to treat
strings that differ only by [trailing]
00 as equal or less. We treat it as equal.The Connect Item also states that the presence of trailing zeroes is the only case in which SQL Server differs from byte-by-byte comparison behavior.
In order to distinguish between two binary values in SQL Server that differ only by trailing
0x00 characters you can also add DATALENGTH into the comparison as indicated in your question.The reason for preferring
DATALENGTH rather than LEN generally here is because the latter gives an implicit cast to varchar and then you get the problem with trailing spaces. +-------------+--------------------+
| LEN(0x2020) | DATALENGTH(0x2020) |
+-------------+--------------------+
| 0 | 2 |
+-------------+--------------------+Though either would work in your use case.
Code Snippets
+-------------+--------------------+
| LEN(0x2020) | DATALENGTH(0x2020) |
+-------------+--------------------+
| 0 | 2 |
+-------------+--------------------+Context
StackExchange Database Administrators Q#48660, answer score: 6
Revisions (0)
No revisions yet.