patternsqlMajor
SQL Server - Select most recent record from each group when performance is critical
Viewed 0 times
criticalrecenteachgroupsqlrecordperformancewhenserverselect
Problem
I run a SQL Server 2016 database where I have the following table with 100+ millions rows:
This table has a clustered index for StationId, ParameterId and DateTime, in this order, all ascending.
What I need is, for each unique pair StationId - ParameterId, return the most recent value from DateTime column:
What I'm doing now is the following query, which takes around 90 to 120 seconds to run:
I've also seen many posts suggesting the following, which takes 10+ minutes to run:
Even in the best case (using GROUP BY clause and MAX aggregate funcition), the execution plan doesn't indicate an Index Seek:
I wonder if there's a better way to perform this query (or to build
StationId | ParameterId | DateTime | Value
1 | 2 | 2020-02-04 15:00:000 | 5.20
1 | 2 | 2020-02-04 14:00:000 | 5.20
1 | 2 | 2020-02-04 13:00:000 | 5.20
1 | 3 | 2020-02-04 15:00:000 | 2.81
1 | 3 | 2020-02-04 14:00:000 | 2.81
1 | 4 | 2020-02-04 15:00:000 | 5.23
2 | 2 | 2020-02-04 15:00:000 | 3.70
2 | 4 | 2020-02-04 15:00:000 | 12.20
3 | 2 | 2020-02-04 15:00:000 | 1.10This table has a clustered index for StationId, ParameterId and DateTime, in this order, all ascending.
What I need is, for each unique pair StationId - ParameterId, return the most recent value from DateTime column:
StationId | ParameterId | LastDate
1 | 2 | 2020-02-04 15:00:000
1 | 3 | 2020-02-04 15:00:000
1 | 4 | 2020-02-04 15:00:000
2 | 2 | 2020-02-04 15:00:000
2 | 4 | 2020-02-04 15:00:000
3 | 2 | 2020-02-04 15:00:000What I'm doing now is the following query, which takes around 90 to 120 seconds to run:
SELECT StationId, ParameterId, MAX(DateTime) AS LastDate
FROM MyTable WITH (NOLOCK)
GROUP BY StationId, ParameterIdI've also seen many posts suggesting the following, which takes 10+ minutes to run:
SELECT StationId, ParameterId, DateTime AS LastDate
FROM
(
SELECT StationId, ParameterId, DateTime
,ROW_NUMBER() OVER (PARTITION BY StationId,ParameterIdORDER BY DateTime DESC) as row_num
FROM MyTable WITH (NOLOCK)
)
WHERE row_num = 1Even in the best case (using GROUP BY clause and MAX aggregate funcition), the execution plan doesn't indicate an Index Seek:
I wonder if there's a better way to perform this query (or to build
Solution
If you have a small-enough number of (StationID, ParameterID) pairs, then try a query like this:
To enable SQL Server to perform a lookup, seeking the latest
With only a Clustered Index on (StationID, ParameterID, DateTime), there's no way for SQL Server to discover the distinct (StationID, ParameterID) pairs without scanning the leaf level of the index, and it can find the largest DateTime while it's scanning.
Also at 100M+ rows, this table might be better as a Clustered Columnstore instead of a BTree Clustered Index.
select StationID, ParameterID, m.DateTime LastDate
from StationParameter sp
cross apply
(
select top 1 DateTime
from MyTable
where StationID = sp.StationID
and ParameterID = sp.ParameterID
order by DateTime desc
) mTo enable SQL Server to perform a lookup, seeking the latest
DateTime for each (StationID,ParameterID) pair.With only a Clustered Index on (StationID, ParameterID, DateTime), there's no way for SQL Server to discover the distinct (StationID, ParameterID) pairs without scanning the leaf level of the index, and it can find the largest DateTime while it's scanning.
Also at 100M+ rows, this table might be better as a Clustered Columnstore instead of a BTree Clustered Index.
Code Snippets
select StationID, ParameterID, m.DateTime LastDate
from StationParameter sp
cross apply
(
select top 1 DateTime
from MyTable
where StationID = sp.StationID
and ParameterID = sp.ParameterID
order by DateTime desc
) mContext
StackExchange Database Administrators Q#258897, answer score: 30
Revisions (0)
No revisions yet.