patternsqlModerate
GROUP BY with MAX versus just MAX
Viewed 0 times
groupversuswithjustmax
Problem
I am a programmer, dealing with a big table which the following scheme:
There is a clustered index on
I was wondering what should be faster:
or
The inserts to this table are in chunks of 50,000 rows with the same date.
So I thought grouping by might ease the
Instead of trying to find max of 150,000 rows, grouping by to 3 rows, and then calculation of
UpdateTime, PK, datetime, notnull
Name, PK, char(14), notnull
TheData, floatThere is a clustered index on
Name, UpdateTimeI was wondering what should be faster:
SELECT MAX(UpdateTime)
FROM [MyTable]or
SELECT MAX([UpdateTime]) AS value
from
(
SELECT [UpdateTime]
FROM [MyTable]
group by [UpdateTime]
) as tThe inserts to this table are in chunks of 50,000 rows with the same date.
So I thought grouping by might ease the
MAX calculation.Instead of trying to find max of 150,000 rows, grouping by to 3 rows, and then calculation of
MAX would be faster ? Is my assumption correct or group by is also costly ?Solution
I created the table big_table according to your schema
I then filled the table with 50,000 rows with this code:
Using SSMS, I then tested both queries and realized that in the first query you are looking for the MAX of TheData and in the second, the MAX of updatetime
I thus modified the first query to also get the MAX of updatetime
Using Statistics Time I get back the number of milliseconds required to parse, compile, and execute each statement
Using Statistics IO I get back information about disk activity
STATISTICS TIME and STATISTICS IO provide useful information. Such as were temporary tables used (indicated by worktable). Also how many logical pages read were read which indicates the number of database pages read from cache.
I then activate Execution plan with CTRL+M (activates show actual execution plan) and then execute with F5.
This will provide a comparison of both queries.
Here is the output of the Messages Tab
-- Query 1
Table 'big_table'. Scan count 1, logical reads 543, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 6 ms.
-- Query 2
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'big_table'. Scan count 1, logical reads 543, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 35 ms.
Both queries result in 543 logical reads, but the second query has an elapsed time of 35ms where as the first has only 6ms. You will also notice that the second query results in the use of temporary tables in tempdb, indicated by the word worktable. Even though all the values for worktable are at 0, work was still done in tempdb.
Then there is the output from the actual Execution plan tab next to the Messages tab
According to the execution plan provided by MSSQL, the second query you provided has a total batch cost of 64% whereas the first only costs 36% of the total batch, so the first query requires less work.
Using SSMS, you can test and compare your queries and find out exactly how MSSQL is parsing your queries and what objects: tables, indexes and/or stats if any are being used to satisfy those queries.
One additional side note to keep in mind when testing is cleaning out the cache before testing, if possible. This helps to ensure that comparisons are accurate and this is important when thinking about disk activity. I start off with DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE to clear out all the cache. Be careful though not to use these commands on a production server actually in use as you will effectively force the server to read everything from disk into memory.
Here is the relevant documentation.
Using these commands may not be possible depending on how your environment is used.
Updated 10/28 12:46pm
Made corrections to the execution plan image and statistics output.
create table big_table
(
updatetime datetime not null,
name char(14) not null,
TheData float,
primary key(Name,updatetime)
)I then filled the table with 50,000 rows with this code:
DECLARE @ROWNUM as bigint = 1
WHILE(1=1)
BEGIN
set @rownum = @ROWNUM + 1
insert into big_table values(getdate(),'name' + cast(@rownum as CHAR), cast(@rownum as float))
if @ROWNUM > 50000
BREAK;
ENDUsing SSMS, I then tested both queries and realized that in the first query you are looking for the MAX of TheData and in the second, the MAX of updatetime
I thus modified the first query to also get the MAX of updatetime
set statistics time on -- execution time
set statistics io on -- io stats (how many pages read, temp tables)
-- query 1
SELECT MAX([UpdateTime])
FROM big_table
-- query 2
SELECT MAX([UpdateTime]) AS value
from
(
SELECT [UpdateTime]
FROM big_table
group by [UpdateTime]
) as t
set statistics time off
set statistics io offUsing Statistics Time I get back the number of milliseconds required to parse, compile, and execute each statement
Using Statistics IO I get back information about disk activity
STATISTICS TIME and STATISTICS IO provide useful information. Such as were temporary tables used (indicated by worktable). Also how many logical pages read were read which indicates the number of database pages read from cache.
I then activate Execution plan with CTRL+M (activates show actual execution plan) and then execute with F5.
This will provide a comparison of both queries.
Here is the output of the Messages Tab
-- Query 1
Table 'big_table'. Scan count 1, logical reads 543, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 6 ms.
-- Query 2
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'big_table'. Scan count 1, logical reads 543, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 35 ms.
Both queries result in 543 logical reads, but the second query has an elapsed time of 35ms where as the first has only 6ms. You will also notice that the second query results in the use of temporary tables in tempdb, indicated by the word worktable. Even though all the values for worktable are at 0, work was still done in tempdb.
Then there is the output from the actual Execution plan tab next to the Messages tab
According to the execution plan provided by MSSQL, the second query you provided has a total batch cost of 64% whereas the first only costs 36% of the total batch, so the first query requires less work.
Using SSMS, you can test and compare your queries and find out exactly how MSSQL is parsing your queries and what objects: tables, indexes and/or stats if any are being used to satisfy those queries.
One additional side note to keep in mind when testing is cleaning out the cache before testing, if possible. This helps to ensure that comparisons are accurate and this is important when thinking about disk activity. I start off with DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE to clear out all the cache. Be careful though not to use these commands on a production server actually in use as you will effectively force the server to read everything from disk into memory.
Here is the relevant documentation.
- Clear the plan cache with DBCC FREEPROCCACHE
- Clear out everything from the buffer pool with DBCC DROPCLEANBUFFERS
Using these commands may not be possible depending on how your environment is used.
Updated 10/28 12:46pm
Made corrections to the execution plan image and statistics output.
Code Snippets
create table big_table
(
updatetime datetime not null,
name char(14) not null,
TheData float,
primary key(Name,updatetime)
)DECLARE @ROWNUM as bigint = 1
WHILE(1=1)
BEGIN
set @rownum = @ROWNUM + 1
insert into big_table values(getdate(),'name' + cast(@rownum as CHAR), cast(@rownum as float))
if @ROWNUM > 50000
BREAK;
ENDset statistics time on -- execution time
set statistics io on -- io stats (how many pages read, temp tables)
-- query 1
SELECT MAX([UpdateTime])
FROM big_table
-- query 2
SELECT MAX([UpdateTime]) AS value
from
(
SELECT [UpdateTime]
FROM big_table
group by [UpdateTime]
) as t
set statistics time off
set statistics io offContext
StackExchange Database Administrators Q#52314, answer score: 12
Revisions (0)
No revisions yet.