HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

function hangs with null case operation

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
casewithnullfunctionoperationhangs

Problem

I created a function that accepts a start and end date, with the end date being optional. I then wrote a CASE in the filter to use the start date if no end date is passed.

CASE WHEN @dateEnd IS NULL
    THEN @dateStart
    ELSE @dateEnd
END


When I call the function for the most recent month of the data:

SELECT * FROM theFunction ('2013-06-01', NULL)


... the query hangs. If I specify the end date:

SELECT * FROM theFunction ('2013-06-01', '2013-06-01')


... the result is returned normally. I took the code out of the function and ran it fine inside a query window. I can't duplicate the issue the fiddle either. A query like:

SELECT * FROM theFunction ('2013-04-01', '2013-06-01')


... also works fine.

Is there anything in the query (below) that could be causing the function to hang when a NULL is passed for the end date?

SQL Fiddle

  • Execution plan for SELECT * FROM theFunction ('2013-06-01', '2013-06-01')



  • Estimated plan for SELECT * FROM theFunction ('2013-06-01', NULL)

Solution

Part of your initial query is as follows.

FROM   [dbo].[calendar] a
          LEFT JOIN [dbo].[colleagueList] b
            ON b.[Date] = a.d
   WHERE  DAY(a.[d]) = 1
          AND a.[d] BETWEEN @dateStart AND COALESCE(@dateEnd,@dateStart)


That section of the plan is shown below

Your revised query BETWEEN @dateStart AND ISNULL(@dateEnd,@dateStart) has this for the same join

The difference seems to be that ISNULL simplifies further and as a result you get more accurate cardinality statistics going into the next join. This is an inline table valued function and you are calling it with literal values so it can do something like.

a.[d] BETWEEN @dateStart AND ISNULL(@dateEnd,@dateStart) 
 a.[d] BETWEEN '2013-06-01' AND ISNULL(NULL,'2013-06-01') 
 a.[d] BETWEEN '2013-06-01' AND '2013-06-01'
 a.[d] = '2013-06-01'


And as there is an equi join predicate b.[Date] = a.d the plan also shows an equality predicate b.[Date] = '2013-06-01'. As a result the cardinality estimate of 28,393 rows is likely to be pretty accurate.

For the CASE/COALESCE version when @dateStart and @dateEnd are the same value then it simplifies OK to the same equality expression and gives the same plan but when @dateStart = '2013-06-01' and @dateEnd IS NULL it only goes as far as

a.[d]>='2013-06-01' AND a.[Date]<=CASE WHEN (1) THEN '2013-06-01' ELSE NULL END


which it also applies as an implied predicate on ColleagueList. The estimated number of rows this time is 79.8 rows.

The next join along is

LEFT JOIN colleagueTime
     ON colleagueTime.TC_DATE = colleagueList.Date
        AND colleagueTime.ASSOC_ID = CAST(colleagueList.ID AS VARCHAR(10))


colleagueTime is a 3,249,590 row table which is (again) apparently a heap with no useful indexes.

This discrepancy in estimates affects the join choice used. The ISNULL plan chooses a hash join that just scans the table once. The COALESCE plan chooses a nested loops join and estimates that it will still just need to scan the table once and be able to spool the result and replay it 78 times. i.e. it estimates that the correlated parameters will not change.

From the fact that the nested loops plan was still going after two hours this assumption of a single scan against colleagueTime seems likely to be highly inaccurate.

As for why the estimated number of rows between the two joins are so much lower I'm not sure without being able to see the statistics on the tables. The only way I managed to skew the estimated row counts that much in my testing was adding a load of NULL rows (this reduced the estimated row count even though the actual number of rows returned remained the same).

The estimated row count in the COALESCE plan with my test data was in the order of

number of rows matching >= condition * 30% * (proportion of rows in the table not null)


Or in SQL

SELECT 1E0 * COUNT([Date]) / COUNT(*) * ( COUNT(CASE
                                                  WHEN [Date] >= '2013-06-01' THEN 1
                                                END) * 0.30 )
FROM   [dbo].[colleagueList]


but this does not square with your comment that the column has no NULL values.

Code Snippets

FROM   [dbo].[calendar] a
          LEFT JOIN [dbo].[colleagueList] b
            ON b.[Date] = a.d
   WHERE  DAY(a.[d]) = 1
          AND a.[d] BETWEEN @dateStart AND COALESCE(@dateEnd,@dateStart)
a.[d] BETWEEN @dateStart AND ISNULL(@dateEnd,@dateStart) 
 a.[d] BETWEEN '2013-06-01' AND ISNULL(NULL,'2013-06-01') 
 a.[d] BETWEEN '2013-06-01' AND '2013-06-01'
 a.[d] = '2013-06-01'
a.[d]>='2013-06-01' AND a.[Date]<=CASE WHEN (1) THEN '2013-06-01' ELSE NULL END
LEFT JOIN colleagueTime
     ON colleagueTime.TC_DATE = colleagueList.Date
        AND colleagueTime.ASSOC_ID = CAST(colleagueList.ID AS VARCHAR(10))
number of rows matching >= condition * 30% * (proportion of rows in the table not null)

Context

StackExchange Database Administrators Q#46486, answer score: 7

Revisions (0)

No revisions yet.