principlesqlMinor
Performance gap between WHERE IN (1,2,3,4) vs IN (select * from STRING_SPLIT('1,2,3,4',','))
Viewed 0 times
wherestring_splitgapbetweenperformanceselectfrom
Problem
I seem to be having a huge performance gap between using hard coded values for a
The interesting thing is the query plan shows the cost as being pretty much the same, but in when I check
Here are 2 outputs from the query plan...
The function code is...
```
CREATE FUNCTION [dbo].[fn_get_samples]
(
@user_id int
)
RETURNS TABLE
AS
RETURN (
-- get samples
select s.sample_id,language_id,native_language_id,s.source_sentence,s.markup_sentence,s.latin_sentence,
s.translation_source_sentence,s.translation_markup_sentence,s.translation_latin_sentence,
isnull(sample_vkl.knowledge_level_id,1) as vocab_knowledge_level_id,
isnull(sample_gkl.grammar_knowledge_level_id,0) as grammar_knowledge_level_id,
s.polite_level_id,
case when isnull(tr1.leitner_deck_index,0)=0 then 0 else cast((tr1.leitner_deck_index-1) as float)/cast((max_leitner_deck_index-1) as float) end as progress_percentage,
case when isnull(tr2.leitner_deck_index,0)=0 then 0 else cast((tr2.leitner_deck_index-1) as float)/cast((max_leitner_deck_index-1) as float) end as listening_progress_percentage,
case when f.object_id is null then 0 else 1 end as is_favorite,
SELECT IN, vs a STRING_SPLIT. The query plans are identical except for the last phase where the index seek is getting executed multiple times for the STRING_SPLIT code. The outcome is a CPU time of about 90000 vs about 15000 (according to dm_exec_query_stats), so the difference is enormous. I have posted the 2 plans here...- Hardcoded Plan
- String Split Plan
The interesting thing is the query plan shows the cost as being pretty much the same, but in when I check
dm_exec_query_stats the cost (last_worker_time) is massively different.Here are 2 outputs from the query plan...
0x79DEAD79D1F149CD 16199
select *
from fn_get_samples(1) s
where s.sample_id in
(2495,2496,2497,2498,2499,2500,2501,2502,2503,2504)
0x4A073840486B252C 86689
select *
from fn_get_samples(1) s
where s.sample_id in
(select value as id
from
STRING_SPLIT('2495,2496,2497,2498,2499,2500,2501,2502,2503,2504',','))The function code is...
```
CREATE FUNCTION [dbo].[fn_get_samples]
(
@user_id int
)
RETURNS TABLE
AS
RETURN (
-- get samples
select s.sample_id,language_id,native_language_id,s.source_sentence,s.markup_sentence,s.latin_sentence,
s.translation_source_sentence,s.translation_markup_sentence,s.translation_latin_sentence,
isnull(sample_vkl.knowledge_level_id,1) as vocab_knowledge_level_id,
isnull(sample_gkl.grammar_knowledge_level_id,0) as grammar_knowledge_level_id,
s.polite_level_id,
case when isnull(tr1.leitner_deck_index,0)=0 then 0 else cast((tr1.leitner_deck_index-1) as float)/cast((max_leitner_deck_index-1) as float) end as progress_percentage,
case when isnull(tr2.leitner_deck_index,0)=0 then 0 else cast((tr2.leitner_deck_index-1) as float)/cast((max_leitner_deck_index-1) as float) end as listening_progress_percentage,
case when f.object_id is null then 0 else 1 end as is_favorite,
Solution
The biggest differences between a hard-coded
In short, using a literal
IN list and the values generated by STRING_SPLIT are:- The optimizer automatically sorts and removes duplicates from a literal
INlist at compile time. This gives the optimizer accurate information about the number and distribution of values.
- Literal values can be embedded in a data access operator (e.g. an index seek). The results of the string split can be used to drive data access but a driving nested loops operator will be needed.
- The return type of the
STRING_SPLITfunction'svaluecolumn isvarcharornvarchardepending on the input arguments. The length of the stringvaluecolumn is the same as the input string. LiteralINvalues are coerced to a compatible type at compile time if necessary.
- The number of rows returned by the
STRING_SPLITfunction is guessed, and the distribution of values is unknown.
In short, using a literal
IN list gives the optimizer much better information to work with.Context
StackExchange Database Administrators Q#166981, answer score: 5
Revisions (0)
No revisions yet.