HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Re-write T-SQL where clause causing performance issue

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sqlwhereissuewritecausingperformanceclause

Problem

I have a query with the below where clause taking lot of time to execute since the underlying table has lot of data in it:

Environment: SQL Server 2022

WHERE clause:

DATEFROMPARTS(dyear, dmonth, dday) 
    BETWEEN DATEADD(m, -2, DATEADD(mm, DATEDIFF(m, 0, GETDATE()), 0))  
        AND CAST(CAST(DATEADD(d, -2, GETDATE()) AS DATE) AS DATETIME)


It doesn't use index on dyear, dmonth, dday (all three int columns) currently due to use of function.

How can I rewrite it so that underlying index can be used to make query run faster?

Apologies, but I won't be able to share the execution plan due to security restrictions.

Data storage format is defined by a third party which isn't in our control. For example, it is not an option to add a computed column to the table which could then be indexed.

Solution

You could expand the date range out and then do correlated index seeks on each date in the range.

An equality on dyear, dmonth, dday is sargable. Each seek can read exactly the rows for that one day so in aggregate the number of rows read by the seeks can be exactly the number needed with no excessive rows read.

As you have only shown a fragment of the query I don't know what the SELECT list needs to be. Don't use * - ideally it will be columns covered by the composite index on those three columns.

Sample Data

CREATE TABLE #yourtable (dyear int not null, dmonth int not null, dday int not null, index ix (dyear, dmonth, dday))

INSERT #yourtable
SELECT TOP 1000000 CRYPT_GEN_RANDOM(1) + 1970,
                   CRYPT_GEN_RANDOM(1)%12 + 1,
                   CRYPT_GEN_RANDOM(1)%28 + 1
FROM   sys.all_objects o1,
       sys.all_objects o2


Query

SELECT yt.dyear,
       yt.dmonth,
       yt.dday
FROM   generate_series(0, DATEDIFF(DAY, DATETRUNC(MONTH, DATEADD(MONTH, -2, GETDATE())), GETDATE()) - 2)
       CROSS APPLY (VALUES(DATEADD(DAY, value, DATETRUNC(MONTH, DATEADD(MONTH, -2, GETDATE()))))) D(calc_date)
       CROSS APPLY (SELECT *
                    FROM   #yourtable yt
                    WHERE  yt.dday = DAY(calc_date)
                           AND yt.dmonth = month(calc_date)
                           AND yt.dyear = YEAR(calc_date)) yt


NB: I did try and get an index seek plan using the approach in Nenad's answer but I wasn't able to get a plan that didn't either read many additional rows or had a step to remove duplicates.

Number of rows read by the seek is exactly as needed when the predicate is just

WHERE  ( dyear > @StartYear
          OR ( dyear = @StartYear
               AND dmonth > @StartMonth )
          OR ( dyear = @StartYear
               AND dmonth = @StartMonth
               AND dday >= @StartDay ) )


But it didn't seem to combine the start and end conditions very effectively.

I also tried expanding out the conditions.

select *
from #yourtable  with (forceseek)
 WHERE 
((dyear > @StartYear) AND (dyear  @StartYear) AND (dyear = @EndYear AND dmonth  @StartYear) AND (dyear = @EndYear AND dmonth = @EndMonth AND dday  @StartMonth) AND (dyear  @StartMonth) AND (dyear = @EndYear AND dmonth  @StartMonth) AND (dyear = @EndYear AND dmonth = @EndMonth AND dday = @StartDay) AND (dyear = @StartDay) AND (dyear = @EndYear AND dmonth = @StartDay) AND (dyear = @EndYear AND dmonth = @EndMonth AND dday <= @EndDay))
OPTION (RECOMPILE);


This does seek exactly the right ranges (able to simplified by the RECOMPILE to just the following)

  • dyear = 2023 and dmonth = 7 and dday >= 1



  • dyear = 2023 and dmonth > 7 and dmonth



  • dyear = 2023 and dmonth = 9 and dday



but the plan has an annoying distinct sort despite these being disjoint

Code Snippets

CREATE TABLE #yourtable (dyear int not null, dmonth int not null, dday int not null, index ix (dyear, dmonth, dday))


INSERT #yourtable
SELECT TOP 1000000 CRYPT_GEN_RANDOM(1) + 1970,
                   CRYPT_GEN_RANDOM(1)%12 + 1,
                   CRYPT_GEN_RANDOM(1)%28 + 1
FROM   sys.all_objects o1,
       sys.all_objects o2
SELECT yt.dyear,
       yt.dmonth,
       yt.dday
FROM   generate_series(0, DATEDIFF(DAY, DATETRUNC(MONTH, DATEADD(MONTH, -2, GETDATE())), GETDATE()) - 2)
       CROSS APPLY (VALUES(DATEADD(DAY, value, DATETRUNC(MONTH, DATEADD(MONTH, -2, GETDATE()))))) D(calc_date)
       CROSS APPLY (SELECT *
                    FROM   #yourtable yt
                    WHERE  yt.dday = DAY(calc_date)
                           AND yt.dmonth = month(calc_date)
                           AND yt.dyear = YEAR(calc_date)) yt
WHERE  ( dyear > @StartYear
          OR ( dyear = @StartYear
               AND dmonth > @StartMonth )
          OR ( dyear = @StartYear
               AND dmonth = @StartMonth
               AND dday >= @StartDay ) )
select *
from #yourtable  with (forceseek)
 WHERE 
((dyear > @StartYear) AND (dyear < @EndYear)) OR
((dyear > @StartYear) AND (dyear = @EndYear AND dmonth < @EndMonth)) OR
((dyear > @StartYear) AND (dyear = @EndYear AND dmonth = @EndMonth AND dday <= @EndDay)) OR
((dyear = @StartYear AND dmonth > @StartMonth) AND (dyear < @EndYear)) OR
((dyear = @StartYear AND dmonth > @StartMonth) AND (dyear = @EndYear AND dmonth < @EndMonth)) OR
((dyear = @StartYear AND dmonth > @StartMonth) AND (dyear = @EndYear AND dmonth = @EndMonth AND dday <= @EndDay)) OR
((dyear = @StartYear AND dmonth = @StartMonth AND dday >= @StartDay) AND (dyear < @EndYear)) OR
((dyear = @StartYear AND dmonth = @StartMonth AND dday >= @StartDay) AND (dyear = @EndYear AND dmonth < @EndMonth)) OR
((dyear = @StartYear AND dmonth = @StartMonth AND dday >= @StartDay) AND (dyear = @EndYear AND dmonth = @EndMonth AND dday <= @EndDay))
OPTION (RECOMPILE);

Context

StackExchange Database Administrators Q#331125, answer score: 9

Revisions (0)

No revisions yet.