HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Improve query performance when selecting almost all rows with many "group by" columns

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsgroupallalmostwithquerycolumnsimproveperformanceselecting

Problem

I have a table with 20 columns and about 600,000 records. The maximum row size is only around 100 bytes. The table is repopulated every few days, but the number of records remain about the same.

For now there is only a single clustered index: an int identity column for the primary key.

I have several queries and views that rely on this table, which usually take 5-10 seconds to execute. When I simply select all records (select * from myTable), it takes about 4 seconds to retrieve all the results.

I haven't been able to find relevant benchmarks for selecting 500,000 records in SQL Server. Is this time typical?

Here is a typical query I perform on the table:

select  CO.Company
    ,CO.Location
    ,CO.Account
    ,CO.SalesRoute
    ,CO.Employee
    ,CO.ProductType
    ,CO.Item
    ,CO.LoadJDate
    ,CO.CommissionRate
    ,SUM(CO.[Extended Sales Price]) AS Sales_Dollars
    ,SUM(CO.[Delivered Qty]) AS Quantity
from    dbo.Commissions_Output CO
where   CO.[Extended Sales Price] <> 0
group by    CO.Company
        ,CO.Location
        ,CO.Account
        ,CO.SalesRoute
        ,CO.Employee
        ,CO.ProductType
        ,CO.Item
        ,CO.LoadJDate
        ,CO.CommissionRate


When I have at least one non-clustered index on the table, I get the following result:


Scan count 18, logical reads 18372;
CPU time = 24818 ms, elapsed time = 8614 ms.

I've tried various indices and combinations (index on the filter column, include the group-by columns; index on all filter/group-by columns and include the aggregate columns; etc.). All of them give the same performance and almost always use the same execution plan.

When I remove all but the clustered index (PK), the performance is often improved by up to 3-4 seconds. The logical reads are reduced while the scan count is halved.

Some notes about the data: the results of the select and where clause before grouping are about 500,000 rows (nearly the entire table). Only about 10,000 rows are combined via groupi

Solution

I'd like to approach the problem from a different angle.

I agree with @ypercube that you can always put up an index to ease on the queries. That said:

  • you mentioned that the table holds relatively small amount of data



  • table is rebuilt only once every few days



  • you shown aggregation over the text columns is the most expensive


part of your typical query you will experience even after creating a covering index

Why not go further and create the aggregations beforehand so the queries don't need to the work many times over? Seems like an ideal case for an indexed view, where you would materialize the aggregating query output early on, or a traditional, dedicated table you'd fill when loading data into Commissions_Output. Either way, you're sacrificing only little disk space for much improved performance.

Indexed views do have a bunch of limitations regarding the environment you want to use them in but have a great benefit of being used automatically instead of the original table in some circumstances.

Context

StackExchange Database Administrators Q#68702, answer score: 5

Revisions (0)

No revisions yet.