patternsqlMinor
Improve query performance when selecting almost all rows with many "group by" columns
Viewed 0 times
rowsgroupallalmostwithquerycolumnsimproveperformanceselecting
Problem
I have a table with 20 columns and about 600,000 records. The maximum row size is only around 100 bytes. The table is repopulated every few days, but the number of records remain about the same.
For now there is only a single clustered index: an int identity column for the primary key.
I have several queries and views that rely on this table, which usually take 5-10 seconds to execute. When I simply select all records (
I haven't been able to find relevant benchmarks for selecting 500,000 records in SQL Server. Is this time typical?
Here is a typical query I perform on the table:
When I have at least one non-clustered index on the table, I get the following result:
Scan count 18, logical reads 18372;
CPU time = 24818 ms, elapsed time = 8614 ms.
I've tried various indices and combinations (index on the filter column, include the group-by columns; index on all filter/group-by columns and include the aggregate columns; etc.). All of them give the same performance and almost always use the same execution plan.
When I remove all but the clustered index (PK), the performance is often improved by up to 3-4 seconds. The logical reads are reduced while the scan count is halved.
Some notes about the data: the results of the select and where clause before grouping are about 500,000 rows (nearly the entire table). Only about 10,000 rows are combined via groupi
For now there is only a single clustered index: an int identity column for the primary key.
I have several queries and views that rely on this table, which usually take 5-10 seconds to execute. When I simply select all records (
select * from myTable), it takes about 4 seconds to retrieve all the results.I haven't been able to find relevant benchmarks for selecting 500,000 records in SQL Server. Is this time typical?
Here is a typical query I perform on the table:
select CO.Company
,CO.Location
,CO.Account
,CO.SalesRoute
,CO.Employee
,CO.ProductType
,CO.Item
,CO.LoadJDate
,CO.CommissionRate
,SUM(CO.[Extended Sales Price]) AS Sales_Dollars
,SUM(CO.[Delivered Qty]) AS Quantity
from dbo.Commissions_Output CO
where CO.[Extended Sales Price] <> 0
group by CO.Company
,CO.Location
,CO.Account
,CO.SalesRoute
,CO.Employee
,CO.ProductType
,CO.Item
,CO.LoadJDate
,CO.CommissionRateWhen I have at least one non-clustered index on the table, I get the following result:
Scan count 18, logical reads 18372;
CPU time = 24818 ms, elapsed time = 8614 ms.
I've tried various indices and combinations (index on the filter column, include the group-by columns; index on all filter/group-by columns and include the aggregate columns; etc.). All of them give the same performance and almost always use the same execution plan.
When I remove all but the clustered index (PK), the performance is often improved by up to 3-4 seconds. The logical reads are reduced while the scan count is halved.
Some notes about the data: the results of the select and where clause before grouping are about 500,000 rows (nearly the entire table). Only about 10,000 rows are combined via groupi
Solution
I'd like to approach the problem from a different angle.
I agree with @ypercube that you can always put up an index to ease on the queries. That said:
part of your typical query you will experience even after creating a covering index
Why not go further and create the aggregations beforehand so the queries don't need to the work many times over? Seems like an ideal case for an indexed view, where you would materialize the aggregating query output early on, or a traditional, dedicated table you'd fill when loading data into
Indexed views do have a bunch of limitations regarding the environment you want to use them in but have a great benefit of being used automatically instead of the original table in some circumstances.
I agree with @ypercube that you can always put up an index to ease on the queries. That said:
- you mentioned that the table holds relatively small amount of data
- table is rebuilt only once every few days
- you shown aggregation over the text columns is the most expensive
part of your typical query you will experience even after creating a covering index
Why not go further and create the aggregations beforehand so the queries don't need to the work many times over? Seems like an ideal case for an indexed view, where you would materialize the aggregating query output early on, or a traditional, dedicated table you'd fill when loading data into
Commissions_Output. Either way, you're sacrificing only little disk space for much improved performance.Indexed views do have a bunch of limitations regarding the environment you want to use them in but have a great benefit of being used automatically instead of the original table in some circumstances.
Context
StackExchange Database Administrators Q#68702, answer score: 5
Revisions (0)
No revisions yet.