HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Way To Speed Up Query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
querywayspeed

Problem

My query returns roughly 590 rows and 8 columns. The issue that I have is that from start to finish the query takes 2 minutes 30 seconds to complete. The great group of people on here have taught me a lot on how to write more efficient queries, so here is another!

I am using a date variable and not a datetime since my variable only contains a date - and I am also using storing my date(s) in the yyyymmdd format as suggested by Aaron Bertrand - Bad Habits To Kick.

Is there anything I can do to optimize this query and have the results return faster?

DECLARE @Startdate date = '20170101', 
        @Enddate   date = '20170131';

WITH fc As
(
    Select
    Teacher
    ,Team
    ,fc
    FROM [Helper].[dbo].[fc]
)
,ia As
(
    Select
    Teacher
    ,tia
    FROM dbo.ia
    WHERE [hiredate] >= @Startdate
    AND [hiredate] = @Startdate
    AND regdate = @Startdate
    AND returndate <  DATEADD(DAY,1,@Enddate)
    GROUP BY Teacher
) AS c
GROUP BY rost.[Teacher],fc.TEAM, fc.fc,b.[students],c.[potstudents]
ORDER BY rost.[Teacher] ASC


EDIT
(forgot the link, sorry)
At the requet of @sabin bio - here is a link to the Query Execution Plan

Also, the CTE's query views which have no indexes on them.

Solution

Start by understanding the plan that the query optimizer generated by your query. In your question you say that the query returns around 590 rows but the query plan included only returns 18 rows. Did you attach the correct plan? I will walk through it anyway. Reading from right to left:

Do a full table scan of roster and get back 18 rows. This result set is used as the outer part of all of the following nested loop joins. For each row in the outer result set:

  • Do a full scan on the ia table



  • Do a full scan on the inb table



  • Do a full scan on the ripcord table



  • Do a full scan on the YRTR table



  • Do a full scan on the students table



  • Do a full scan on the potstudents table



At some point the number of rows gets bumped up to 19. The GROUP BY is implemented as the SORT with a node id of 4. The DISTINCT is implemented as a Sort (Distinct Sort) with a node id of 0.

As the programmer you do not have direct control over the query plan but there's a lot that you can do to influence it. You said that the query execution time is too slow. Does the above plan sound efficient to you? Is it what you would do if you could choose the query optimizer's actions? You're doing around 140 table scans just to return 18 rows. For a very small number of rows this might be okay but it sounds like you have more data in production.

One way to optimize a query is to reduce the IO requirements of that query. Here you are doing lots of table scans so that should be straightforward to implement. Create indexes so that the query optimizer can get the relevant data more efficiently. Even if you are referencing views you can create indexes on the tables used by the view to improve performance. To give you an example, here is the predicate for the table access on fc:

[Test].[dbo].[roster].[Teacher] as [rost].[Teacher]=[Test].[dbo].[fc].[Teacher]


If you create an index on the Teacher column of fc that may result in a different, and possibly more efficient, table access method for that table. Optionally, you could add all of the columns used by the Teacher table as INCLUDE columns.

Looking at just your code, you should be aware of the impact of using local variables in your query. The query optimizer will not know the values of those local variables when creating a query plan. It will do default cardinality estimates based on hard-coded rules. For some queries, adding an OPTION (RECOMPILE) hint or replacing the local variables with hard-coded values can result in a much better query plan because the query optimizer knows the values of the variables before creating the plan.

Also, do not haphazardly add DISTINCT and GROUP BY to your queries. It looks like the query optimizer optimized away some of them, but most of the time if you have both GROUP BY and DISTINCT in the same query you're doing something wrong. It is not clear how much of a performance impact making that change would have for this query.

Code Snippets

[Test].[dbo].[roster].[Teacher] as [rost].[Teacher]=[Test].[dbo].[fc].[Teacher]

Context

StackExchange Database Administrators Q#164075, answer score: 5

Revisions (0)

No revisions yet.