HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to delay convert() evaluation until after join

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
afterevaluationdelayuntilconvertjoinhow

Problem

The following query converts a single string of packed CSV that represents 13k rows by 2 columns. Column A is a bigint. Column B is a smallint.

declare
    @dataCsv nvarchar(max) = '29653,36,19603,36,19604,36,29654,36';  -- abbreviated, actually 13k rows

  with Input as
  (
      select Value,
             Row = row_number() over (order by (select null)) - 1
      from string_split(@dataCsv, ',') o
  )
  --insert into PubCache.TableName
  select 78064 as CacheId,
         convert(bigint, i.Value) as ObjectId,
         convert(smallint, i2.Value) as BrandId
    from Input i
    inner hash join Input i2    -- hash to encourage string_split() only once per column
      on i2.Row = i.Row + 1
    where i.Row % 2 = 0
    order by i.Row


Execution Plan: https://www.brentozar.com/pastetheplan/?id=By0hYPmd6

As shown in the plan, the evaluation of the convert() is occurring before the join so sometimes (depending on length of input), it fails with

The conversion of the nvarchar value '37645' overflowed an INT2
column. Use a larger integer column.

Temporarily changing the conversion from smallint to int allows the query to complete and inspecting the BrandId column output shows that it always just contains the value '36' for this example.

Is there a simple way to delay the convert(smallint, i2.Value) until the after the join so that only those expected CSV positions are converted?

I realize there are other ways to pack a string stream (such as using multiple variables, or interlacing different split chars, etc) but am not interested in solving this example that way for purposes of this question. Thanks!

Solution

If you want to use this self join approach I'd just use try_convert so you aren't dependent on where the convert gets evaluated.

TRY_CONVERT(smallint, '37645') returns NULL rather than error so it doesn't matter if that happens in a row subsequently filtered out.

Also your approach to calculate an ordinal from string_split is not reliable.

The output rows might be in any order. The order isn't guaranteed to
match the order of the substrings in the input string.

And even if that wasn't the case order by (select null) would still not guarantee anything.

Assuming you aren't on SQL Server 2022 (so the enable_ordinal argument isn't available to you) you can tweak the CSV to get it into JSON array format and get supplied an ordinal that way.

declare
    @dataCsv nvarchar(max) = '29653,36,19603,36,19604,36,29654,36';  -- abbreviated, actually 13k rows

  with Input as
  (
    SELECT Value,
           Row = CAST([key] AS INT)
    FROM OPENJSON(CONCAT('[', @dataCsv, ']'))
  )
  --insert into PubCache.TableName
  select 78064 as CacheId,
         try_convert(bigint, i.Value) as ObjectId,
         try_convert(smallint, i2.Value) as BrandId
    from Input i
    inner hash join Input i2    -- hash to encourage string_split() only once per column
      on i2.Row = i.Row + 1
    where i.Row % 2 = 0
    order by i.Row


But my suggestion would be to drop the self join entirely and use conditional aggregation.

In this case if you are certain that every second element in the string will be a valid smallint you can revert to convert. The CASE expression should prevent it being evaluated on the values where ordinal %2 does not evaluate to 1.

DECLARE @dataCsv NVARCHAR(max) = '29653,36,19603,36,19604,36,29654,36'; -- abbreviated, actually 13k rows

WITH Input
     AS (SELECT Value,
                Row = [key] / 2,
                Col = [key] % 2
         FROM   OPENJSON(CONCAT('[', @dataCsv, ']')))
--insert into PubCache.TableName
SELECT CacheId = 78064,
       ObjectId = MAX(CONVERT(BIGINT,  CASE WHEN Col = 0 THEN Value END)),
       BrandId = MAX(CONVERT(SMALLINT, CASE WHEN Col = 1 THEN Value END))
FROM   Input i
GROUP  BY Row
ORDER  BY Row

Code Snippets

declare
    @dataCsv nvarchar(max) = '29653,36,19603,36,19604,36,29654,36';  -- abbreviated, actually 13k rows

  with Input as
  (
    SELECT Value,
           Row = CAST([key] AS INT)
    FROM OPENJSON(CONCAT('[', @dataCsv, ']'))
  )
  --insert into PubCache.TableName
  select 78064 as CacheId,
         try_convert(bigint, i.Value) as ObjectId,
         try_convert(smallint, i2.Value) as BrandId
    from Input i
    inner hash join Input i2    -- hash to encourage string_split() only once per column
      on i2.Row = i.Row + 1
    where i.Row % 2 = 0
    order by i.Row
DECLARE @dataCsv NVARCHAR(max) = '29653,36,19603,36,19604,36,29654,36'; -- abbreviated, actually 13k rows

WITH Input
     AS (SELECT Value,
                Row = [key] / 2,
                Col = [key] % 2
         FROM   OPENJSON(CONCAT('[', @dataCsv, ']')))
--insert into PubCache.TableName
SELECT CacheId = 78064,
       ObjectId = MAX(CONVERT(BIGINT,  CASE WHEN Col = 0 THEN Value END)),
       BrandId = MAX(CONVERT(SMALLINT, CASE WHEN Col = 1 THEN Value END))
FROM   Input i
GROUP  BY Row
ORDER  BY Row

Context

StackExchange Database Administrators Q#334533, answer score: 6

Revisions (0)

No revisions yet.