HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

Performance gap between WHERE IN (1,2,3,4) vs IN (select * from STRING_SPLIT('1,2,3,4',','))

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
wherestring_splitgapbetweenperformanceselectfrom

Problem

I seem to be having a huge performance gap between using hard coded values for a SELECT IN, vs a STRING_SPLIT. The query plans are identical except for the last phase where the index seek is getting executed multiple times for the STRING_SPLIT code. The outcome is a CPU time of about 90000 vs about 15000 (according to dm_exec_query_stats), so the difference is enormous. I have posted the 2 plans here...

  • Hardcoded Plan



  • String Split Plan



The interesting thing is the query plan shows the cost as being pretty much the same, but in when I check dm_exec_query_stats the cost (last_worker_time) is massively different.

Here are 2 outputs from the query plan...

0x79DEAD79D1F149CD 16199 
select * 
from fn_get_samples(1) s 
where s.sample_id in 
    (2495,2496,2497,2498,2499,2500,2501,2502,2503,2504)

0x4A073840486B252C 86689 
select * 
from fn_get_samples(1) s 
where s.sample_id in 
    (select value as id 
     from 
     STRING_SPLIT('2495,2496,2497,2498,2499,2500,2501,2502,2503,2504',','))


The function code is...

```
CREATE FUNCTION [dbo].[fn_get_samples]
(
@user_id int
)
RETURNS TABLE
AS
RETURN (
-- get samples
select s.sample_id,language_id,native_language_id,s.source_sentence,s.markup_sentence,s.latin_sentence,
s.translation_source_sentence,s.translation_markup_sentence,s.translation_latin_sentence,
isnull(sample_vkl.knowledge_level_id,1) as vocab_knowledge_level_id,
isnull(sample_gkl.grammar_knowledge_level_id,0) as grammar_knowledge_level_id,
s.polite_level_id,
case when isnull(tr1.leitner_deck_index,0)=0 then 0 else cast((tr1.leitner_deck_index-1) as float)/cast((max_leitner_deck_index-1) as float) end as progress_percentage,
case when isnull(tr2.leitner_deck_index,0)=0 then 0 else cast((tr2.leitner_deck_index-1) as float)/cast((max_leitner_deck_index-1) as float) end as listening_progress_percentage,
case when f.object_id is null then 0 else 1 end as is_favorite,

Solution

The biggest differences between a hard-coded IN list and the values generated by STRING_SPLIT are:

  • The optimizer automatically sorts and removes duplicates from a literal IN list at compile time. This gives the optimizer accurate information about the number and distribution of values.



  • Literal values can be embedded in a data access operator (e.g. an index seek). The results of the string split can be used to drive data access but a driving nested loops operator will be needed.



  • The return type of the STRING_SPLIT function's value column is varchar or nvarchar depending on the input arguments. The length of the string value column is the same as the input string. Literal IN values are coerced to a compatible type at compile time if necessary.



  • The number of rows returned by the STRING_SPLIT function is guessed, and the distribution of values is unknown.



In short, using a literal IN list gives the optimizer much better information to work with.

Context

StackExchange Database Administrators Q#166981, answer score: 5

Revisions (0)

No revisions yet.