HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Best Data Type to Store Result of HASHBYTES('MD5', ...)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
resultmd5storetypehashbytesdatabest

Problem

What would be the best data type to store the results of the HASHBYTES('MD5', ...)?

It outputs 16 bytes of binary as follows: e.g.

0x5CFCD77F9FF836189D2F647EBCEA183E


I could store it in the following data types:

  • char(34)



  • binary(16) (I think - I read here (https://stackoverflow.com/questions/14722305/what-kind-of-datatype-should-one-use-to-store-hashes#16680423) that using the same algo should return the same number of bytes every time regardless of the input string)



  • other?



Every row will have a value (no nulls), and the column will be used for comparison against a similar column in another table.

Which is the best data type to store HASHBYTES output for use as described above?

I was thinking that since fixed length data types can sometimes be more efficient on joins, etc. binary(16) vs varbinary(8000) (the default output of HASHBYTES) seems best, and binary(16) vs a varchar(34) is better since it would use less storage space.

Solution

According to Books Online:


The output conforms to the algorithm standard: 128 bits (16 bytes) for
MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32
bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

So despite the fact that the function technically returns varbinary(8000), you'll only ever get 16 bytes with MD5. If you are completely certain you only need MD5, then a binary(16) will do it. If you want to play it safe, and allow any possible algorithm (current or future) that HASHBYTES can handle, go with varbinary(8000).

I definitely wouldn't recommend a char type, since that will involve string conversion overhead, and unnecessary storage overhead.

Context

StackExchange Database Administrators Q#191979, answer score: 10

Revisions (0)

No revisions yet.