snippetsqlModerate
What is the fastest way to convert many nullable ints 1:1 to a binary string?
Viewed 0 times
thewhatconvertwaynullablebinaryfastestmanystringints
Problem
Part of my workload uses a CLR function that implements the spooky hash algorithm to compare rows to see if any column values have changed. The CLR function takes a binary string as an input, so I need a fast way to convert rows to a binary string. I expect to hash around 10 billion rows during the full workload so I'd like this code to be as fast as possible.
I have about 300 tables with different schemas. For the purposes of this question please assume a simple table structure of 32 nullable
Rows must be converted to the same binary string if all column values are the same. Rows must be converted to different binary strings if any column value is different. For example, code as simple as the following will not work:
It does not handle NULLs correctly. If
To preempt some questions:
What is the fastest way to convert 32 nullable
Sample data and code as promised:
```
-- create sample data
DROP TABLE IF EXISTS dbo.TABLE_OF_32_INTS;
CREATE TABLE dbo.TABLE_OF_32_INTS (
COL1 INT NULL,
COL2 INT NULL,
COL3 INT NULL,
COL4 INT NULL,
COL5 INT NULL,
COL6 INT NULL,
COL7 INT NULL,
COL8 INT NULL,
COL9 INT NULL,
COL10 INT NULL,
COL11 INT NULL,
COL12 INT NULL,
COL13 IN
I have about 300 tables with different schemas. For the purposes of this question please assume a simple table structure of 32 nullable
INT columns. I've provided sample data as well as a way to benchmark results at the bottom of this question.Rows must be converted to the same binary string if all column values are the same. Rows must be converted to different binary strings if any column value is different. For example, code as simple as the following will not work:
CAST(COL1 AS BINARY(4)) + CAST(COL2 AS BINARY(4)) + ..It does not handle NULLs correctly. If
COL1 is NULL for row 1 and COL2 is NULL for row 2 then both rows will be converted to a NULL string. I believe that correct handling of NULLs is the hardest part of converting the entire row correctly. All allowed values for the INT columns are possible.To preempt some questions:
- If it matters, a significant majority of the time (90%+) the columns won't be NULL.
- I have to use the CLR.
- I have to hash this many rows. I cannot persist the hashes.
- I believe that I cannot use batch mode for the conversion due to the presence of the CLR function.
What is the fastest way to convert 32 nullable
INT columns to a BINARY(X) or VARBINARY(X) string?Sample data and code as promised:
```
-- create sample data
DROP TABLE IF EXISTS dbo.TABLE_OF_32_INTS;
CREATE TABLE dbo.TABLE_OF_32_INTS (
COL1 INT NULL,
COL2 INT NULL,
COL3 INT NULL,
COL4 INT NULL,
COL5 INT NULL,
COL6 INT NULL,
COL7 INT NULL,
COL8 INT NULL,
COL9 INT NULL,
COL10 INT NULL,
COL11 INT NULL,
COL12 INT NULL,
COL13 IN
Solution
On my machine (SQL Server 2017) the following C# SQLCLR function runs about 30% faster than the
It requires
For testing purposes, the easiest way to get this unsafe assembly working is to set the database to
Compiled code
For convenience the
T-SQL Function Stub
Source code
The C# source is at https://gist.github.com/SQLKiwi/64f320fe7fd802a68a3a644aa8b8af9f
If you compile this for yourself, you must use a Class Library (.dll) as the target project type and check the Allow Unsafe Code build option.
Combined solution
Since you ultimately want to compute the SpookyHash of the binary data returned above, you can call SpookyHash within the CLR function and return the 16-byte hash.
An example implementation based on a table with a mixture of column data types is at https://gist.github.com/SQLKiwi/6f82582a4ad1920c372fac118ec82460. This includes an unsafe inlined version of the Spooky Hash algorithm derived from Jon Hanna's SpookilySharp and the original public domain C source code by Bob Jenkins.
binary(5) idea, 35% faster than CONCAT_WS, and in half the time of the self-answer.It requires
UNSAFE permission and uses pointers. The implementation is very specifically tied to the test data.For testing purposes, the easiest way to get this unsafe assembly working is to set the database to
TRUSTWORTHY and disable the clr strict security configuration option if necessary.Compiled code
For convenience the
CREATE ASSEMBLY compiled bits are at https://gist.github.com/SQLKiwi/72d01b661c74485900e7ebcfdc63ab8eT-SQL Function Stub
CREATE FUNCTION dbo.NullableIntsToBinary
(
@Col01 int, @Col02 int, @Col03 int, @Col04 int, @Col05 int, @Col06 int, @Col07 int, @Col08 int,
@Col09 int, @Col10 int, @Col11 int, @Col12 int, @Col13 int, @Col14 int, @Col15 int, @Col16 int,
@Col17 int, @Col18 int, @Col19 int, @Col20 int, @Col21 int, @Col22 int, @Col23 int, @Col24 int,
@Col25 int, @Col26 int, @Col27 int, @Col28 int, @Col29 int, @Col30 int, @Col31 int, @Col32 int
)
RETURNS binary(132)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME Obbish.UserDefinedFunctions.NullableIntsToBinary;Source code
The C# source is at https://gist.github.com/SQLKiwi/64f320fe7fd802a68a3a644aa8b8af9f
If you compile this for yourself, you must use a Class Library (.dll) as the target project type and check the Allow Unsafe Code build option.
Combined solution
Since you ultimately want to compute the SpookyHash of the binary data returned above, you can call SpookyHash within the CLR function and return the 16-byte hash.
An example implementation based on a table with a mixture of column data types is at https://gist.github.com/SQLKiwi/6f82582a4ad1920c372fac118ec82460. This includes an unsafe inlined version of the Spooky Hash algorithm derived from Jon Hanna's SpookilySharp and the original public domain C source code by Bob Jenkins.
Code Snippets
CREATE FUNCTION dbo.NullableIntsToBinary
(
@Col01 int, @Col02 int, @Col03 int, @Col04 int, @Col05 int, @Col06 int, @Col07 int, @Col08 int,
@Col09 int, @Col10 int, @Col11 int, @Col12 int, @Col13 int, @Col14 int, @Col15 int, @Col16 int,
@Col17 int, @Col18 int, @Col19 int, @Col20 int, @Col21 int, @Col22 int, @Col23 int, @Col24 int,
@Col25 int, @Col26 int, @Col27 int, @Col28 int, @Col29 int, @Col30 int, @Col31 int, @Col32 int
)
RETURNS binary(132)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME Obbish.UserDefinedFunctions.NullableIntsToBinary;Context
StackExchange Database Administrators Q#237684, answer score: 11
Revisions (0)
No revisions yet.