snippetsqlMinor
How to delay convert() evaluation until after join
Viewed 0 times
afterevaluationdelayuntilconvertjoinhow
Problem
The following query converts a single string of packed CSV that represents 13k rows by 2 columns. Column A is a bigint. Column B is a smallint.
Execution Plan: https://www.brentozar.com/pastetheplan/?id=By0hYPmd6
As shown in the plan, the evaluation of the convert() is occurring before the join so sometimes (depending on length of input), it fails with
The conversion of the nvarchar value '37645' overflowed an INT2
column. Use a larger integer column.
Temporarily changing the conversion from smallint to int allows the query to complete and inspecting the BrandId column output shows that it always just contains the value '36' for this example.
Is there a simple way to delay the convert(smallint, i2.Value) until the after the join so that only those expected CSV positions are converted?
I realize there are other ways to pack a string stream (such as using multiple variables, or interlacing different split chars, etc) but am not interested in solving this example that way for purposes of this question. Thanks!
declare
@dataCsv nvarchar(max) = '29653,36,19603,36,19604,36,29654,36'; -- abbreviated, actually 13k rows
with Input as
(
select Value,
Row = row_number() over (order by (select null)) - 1
from string_split(@dataCsv, ',') o
)
--insert into PubCache.TableName
select 78064 as CacheId,
convert(bigint, i.Value) as ObjectId,
convert(smallint, i2.Value) as BrandId
from Input i
inner hash join Input i2 -- hash to encourage string_split() only once per column
on i2.Row = i.Row + 1
where i.Row % 2 = 0
order by i.RowExecution Plan: https://www.brentozar.com/pastetheplan/?id=By0hYPmd6
As shown in the plan, the evaluation of the convert() is occurring before the join so sometimes (depending on length of input), it fails with
The conversion of the nvarchar value '37645' overflowed an INT2
column. Use a larger integer column.
Temporarily changing the conversion from smallint to int allows the query to complete and inspecting the BrandId column output shows that it always just contains the value '36' for this example.
Is there a simple way to delay the convert(smallint, i2.Value) until the after the join so that only those expected CSV positions are converted?
I realize there are other ways to pack a string stream (such as using multiple variables, or interlacing different split chars, etc) but am not interested in solving this example that way for purposes of this question. Thanks!
Solution
If you want to use this self join approach I'd just use
Also your approach to calculate an ordinal from
The output rows might be in any order. The order isn't guaranteed to
match the order of the substrings in the input string.
And even if that wasn't the case
Assuming you aren't on SQL Server 2022 (so the
But my suggestion would be to drop the self join entirely and use conditional aggregation.
In this case if you are certain that every second element in the string will be a valid
try_convert so you aren't dependent on where the convert gets evaluated.TRY_CONVERT(smallint, '37645') returns NULL rather than error so it doesn't matter if that happens in a row subsequently filtered out.Also your approach to calculate an ordinal from
string_split is not reliable.The output rows might be in any order. The order isn't guaranteed to
match the order of the substrings in the input string.
And even if that wasn't the case
order by (select null) would still not guarantee anything.Assuming you aren't on SQL Server 2022 (so the
enable_ordinal argument isn't available to you) you can tweak the CSV to get it into JSON array format and get supplied an ordinal that way.declare
@dataCsv nvarchar(max) = '29653,36,19603,36,19604,36,29654,36'; -- abbreviated, actually 13k rows
with Input as
(
SELECT Value,
Row = CAST([key] AS INT)
FROM OPENJSON(CONCAT('[', @dataCsv, ']'))
)
--insert into PubCache.TableName
select 78064 as CacheId,
try_convert(bigint, i.Value) as ObjectId,
try_convert(smallint, i2.Value) as BrandId
from Input i
inner hash join Input i2 -- hash to encourage string_split() only once per column
on i2.Row = i.Row + 1
where i.Row % 2 = 0
order by i.RowBut my suggestion would be to drop the self join entirely and use conditional aggregation.
In this case if you are certain that every second element in the string will be a valid
smallint you can revert to convert. The CASE expression should prevent it being evaluated on the values where ordinal %2 does not evaluate to 1.DECLARE @dataCsv NVARCHAR(max) = '29653,36,19603,36,19604,36,29654,36'; -- abbreviated, actually 13k rows
WITH Input
AS (SELECT Value,
Row = [key] / 2,
Col = [key] % 2
FROM OPENJSON(CONCAT('[', @dataCsv, ']')))
--insert into PubCache.TableName
SELECT CacheId = 78064,
ObjectId = MAX(CONVERT(BIGINT, CASE WHEN Col = 0 THEN Value END)),
BrandId = MAX(CONVERT(SMALLINT, CASE WHEN Col = 1 THEN Value END))
FROM Input i
GROUP BY Row
ORDER BY RowCode Snippets
declare
@dataCsv nvarchar(max) = '29653,36,19603,36,19604,36,29654,36'; -- abbreviated, actually 13k rows
with Input as
(
SELECT Value,
Row = CAST([key] AS INT)
FROM OPENJSON(CONCAT('[', @dataCsv, ']'))
)
--insert into PubCache.TableName
select 78064 as CacheId,
try_convert(bigint, i.Value) as ObjectId,
try_convert(smallint, i2.Value) as BrandId
from Input i
inner hash join Input i2 -- hash to encourage string_split() only once per column
on i2.Row = i.Row + 1
where i.Row % 2 = 0
order by i.RowDECLARE @dataCsv NVARCHAR(max) = '29653,36,19603,36,19604,36,29654,36'; -- abbreviated, actually 13k rows
WITH Input
AS (SELECT Value,
Row = [key] / 2,
Col = [key] % 2
FROM OPENJSON(CONCAT('[', @dataCsv, ']')))
--insert into PubCache.TableName
SELECT CacheId = 78064,
ObjectId = MAX(CONVERT(BIGINT, CASE WHEN Col = 0 THEN Value END)),
BrandId = MAX(CONVERT(SMALLINT, CASE WHEN Col = 1 THEN Value END))
FROM Input i
GROUP BY Row
ORDER BY RowContext
StackExchange Database Administrators Q#334533, answer score: 6
Revisions (0)
No revisions yet.