snippetsqlMinor
How can I get running totals of recent rows faster?
Viewed 0 times
rowscanrecentfasterrunninggethowtotals
Problem
I'm currently designing a transaction table. I realized that calculating running totals for each row will be needed and this might be slow in performance. So I created a table with 1 million rows for testing purposes.
And I tried to get 10 recent rows and its running totals, but it took about 10 seconds.
I suspected
```
--2nd attempt
SELECT *
,(
SELECT SUM(value)
FROM Table_1
WHERE seq <= t.seq
) total
FROM (
SELECT TOP 10 seq
,value
FROM Table_1
ORDER BY seq DESC
) t
ORDER BY seq DESC
--(10 rows affected)
--Table 'Table_1'. Scan count 11, logical reads 26083, physical reads 1, read-ahead reads 443, lob logical reads 0, lob physical reads 0, lob read
CREATE TABLE [dbo].[Table_1](
[seq] [int] IDENTITY(1,1) NOT NULL,
[value] [bigint] NOT NULL,
CONSTRAINT [PK_Table_1] PRIMARY KEY CLUSTERED
(
[seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GOAnd I tried to get 10 recent rows and its running totals, but it took about 10 seconds.
--1st attempt
SELECT TOP 10 seq
,value
,sum(value) OVER (ORDER BY seq) total
FROM Table_1
ORDER BY seq DESC
--(10 rows affected)
--Table 'Worktable'. Scan count 1000001, logical reads 8461526, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'Table_1'. Scan count 1, logical reads 2608, physical reads 516, read-ahead reads 2617, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--
--(1 row affected)
--
-- SQL Server Execution Times:
-- CPU time = 8483 ms, elapsed time = 9786 ms.I suspected
TOP for the reason of slow performance from the plan, so I changed query like this, and it took about 1~2 seconds. But I think this is still slow for production and wondering if this can be improved further.```
--2nd attempt
SELECT *
,(
SELECT SUM(value)
FROM Table_1
WHERE seq <= t.seq
) total
FROM (
SELECT TOP 10 seq
,value
FROM Table_1
ORDER BY seq DESC
) t
ORDER BY seq DESC
--(10 rows affected)
--Table 'Table_1'. Scan count 11, logical reads 26083, physical reads 1, read-ahead reads 443, lob logical reads 0, lob physical reads 0, lob read
Solution
Difference in the first two approaches
The first plan spends about 7 of the 10 seconds in the Window Spool operator, so this is the main reason it's so slow. It's performing a lot of I/O in tempdb to create this. My stats I/O and time look like this:
The second plan is able to avoid the spool, and thus the worktable entirely. It simply grabs the top 10 rows from the clustered index, and then does a nested loops join to the aggregation (sum) coming out of a separate clustered index scan. The inner side still ends up reading the whole table, but the table is very dense, so this is reasonably efficient with a million rows.
Improving performance
Columnstore
If you really want the "online reporting" approach, columnstore is likely your best option.
Then this query is ridiculously fast:
Here are the stats from my machine:
You're likely not going to beat that (unless you're really smart - nice one, Joe). Columnstore is crazy good at scanning and aggregating large amounts of data.
Using
You can get very similar performance to your second query with this approach, which was mentioned in another answer, and which I used in the columnstore example above (execution plan):
It results in fewer reads than your second approach, and no tempdb activity vs your first approach because the window spool occurs in memory:
...RANGE uses an on-disk spool, while ROWS uses an in-memory spool
Unfortunately, runtime is about the same as your second approach.
Schema-based solution: async running totals
Since you're open to other ideas, you could consider updating the "running total" asynchronously. You could periodically take the results of one of these queries, and load it into a "totals" table. So you'd do something like this:
Load it every day / hour / whatever (this took about 2 seconds on my machine with 1mm rows, and could be optimized):
Then your reporting query is very efficient:
Here are the read stats:
Schema-based solution: in-row totals with constraints
A really interesting solution to this is covered in detail in this answer to the question: Writing a simple bank schema: How should I keep my balances in sync with their transaction history?
The basic approach would be to track the current running total in-row along with the previous running total and sequence number. Then you can use constraints to validate the running totals are always correct and up-to-date.
Credit to Paul White for providing a sample implementation for the schema in this Q&A:
```
CREATE TABLE dbo.Table_1
(
seq integer IDENTITY(1,1) NOT NULL,
val bigint NOT NULL,
total bigint NOT NULL,
prev_seq integer NULL,
prev_total bigint NULL,
CONSTRAINT [PK_Table_1]
PRIMARY KEY CLUSTERED (seq ASC),
CONSTRAINT [UQ dbo.Table_1 seq, total]
UNIQUE (seq, total),
CONSTRAINT [UQ dbo.Table_1 prev_seq]
The first plan spends about 7 of the 10 seconds in the Window Spool operator, so this is the main reason it's so slow. It's performing a lot of I/O in tempdb to create this. My stats I/O and time look like this:
Table 'Worktable'. Scan count 1000001, logical reads 8461526
Table 'Table_1'. Scan count 1, logical reads 2609
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 8641 ms, elapsed time = 8537 ms.The second plan is able to avoid the spool, and thus the worktable entirely. It simply grabs the top 10 rows from the clustered index, and then does a nested loops join to the aggregation (sum) coming out of a separate clustered index scan. The inner side still ends up reading the whole table, but the table is very dense, so this is reasonably efficient with a million rows.
Table 'Table_1'. Scan count 11, logical reads 26093
SQL Server Execution Times:
CPU time = 1563 ms, elapsed time = 1671 ms.Improving performance
Columnstore
If you really want the "online reporting" approach, columnstore is likely your best option.
ALTER TABLE [dbo].[Table_1] DROP CONSTRAINT [PK_Table_1];
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Table_1] ON dbo.Table_1;Then this query is ridiculously fast:
SELECT TOP 10
seq,
value,
SUM(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING)
FROM dbo.Table_1
ORDER BY seq DESC;Here are the stats from my machine:
Table 'Table_1'. Scan count 4, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 3319
Table 'Table_1'. Segment reads 1, segment skipped 0.
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 375 ms, elapsed time = 205 ms.You're likely not going to beat that (unless you're really smart - nice one, Joe). Columnstore is crazy good at scanning and aggregating large amounts of data.
Using
ROW rather than RANGE window function optionYou can get very similar performance to your second query with this approach, which was mentioned in another answer, and which I used in the columnstore example above (execution plan):
SELECT TOP 10
seq,
value,
SUM(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING)
FROM dbo.Table_1
ORDER BY seq DESC;It results in fewer reads than your second approach, and no tempdb activity vs your first approach because the window spool occurs in memory:
...RANGE uses an on-disk spool, while ROWS uses an in-memory spool
Unfortunately, runtime is about the same as your second approach.
Table 'Worktable'. Scan count 0, logical reads 0
Table 'Table_1'. Scan count 1, logical reads 2609
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 1984 ms, elapsed time = 1474 ms.Schema-based solution: async running totals
Since you're open to other ideas, you could consider updating the "running total" asynchronously. You could periodically take the results of one of these queries, and load it into a "totals" table. So you'd do something like this:
CREATE TABLE [dbo].[Table_1_Totals]
(
[seq] [int] NOT NULL,
[running_total] [bigint] NOT NULL,
CONSTRAINT [PK_Table_1_Totals] PRIMARY KEY CLUSTERED ([seq])
);Load it every day / hour / whatever (this took about 2 seconds on my machine with 1mm rows, and could be optimized):
INSERT INTO dbo.Table_1_Totals
SELECT
seq,
SUM(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING) as total
FROM dbo.Table_1 t
WHERE NOT EXISTS (
SELECT NULL
FROM dbo.Table_1_Totals t2
WHERE t.seq = t2.seq)
ORDER BY seq DESC;Then your reporting query is very efficient:
SELECT TOP 10
t.seq,
t.value,
t2.running_total
FROM dbo.Table_1 t
INNER JOIN dbo.Table_1_Totals t2
ON t.seq = t2.seq
ORDER BY seq DESC;Here are the read stats:
Table 'Table_1'. Scan count 0, logical reads 35
Table 'Table_1_Totals'. Scan count 1, logical reads 3Schema-based solution: in-row totals with constraints
A really interesting solution to this is covered in detail in this answer to the question: Writing a simple bank schema: How should I keep my balances in sync with their transaction history?
The basic approach would be to track the current running total in-row along with the previous running total and sequence number. Then you can use constraints to validate the running totals are always correct and up-to-date.
Credit to Paul White for providing a sample implementation for the schema in this Q&A:
```
CREATE TABLE dbo.Table_1
(
seq integer IDENTITY(1,1) NOT NULL,
val bigint NOT NULL,
total bigint NOT NULL,
prev_seq integer NULL,
prev_total bigint NULL,
CONSTRAINT [PK_Table_1]
PRIMARY KEY CLUSTERED (seq ASC),
CONSTRAINT [UQ dbo.Table_1 seq, total]
UNIQUE (seq, total),
CONSTRAINT [UQ dbo.Table_1 prev_seq]
Code Snippets
Table 'Worktable'. Scan count 1000001, logical reads 8461526
Table 'Table_1'. Scan count 1, logical reads 2609
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 8641 ms, elapsed time = 8537 ms.Table 'Table_1'. Scan count 11, logical reads 26093
SQL Server Execution Times:
CPU time = 1563 ms, elapsed time = 1671 ms.ALTER TABLE [dbo].[Table_1] DROP CONSTRAINT [PK_Table_1];
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Table_1] ON dbo.Table_1;SELECT TOP 10
seq,
value,
SUM(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING)
FROM dbo.Table_1
ORDER BY seq DESC;Table 'Table_1'. Scan count 4, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 3319
Table 'Table_1'. Segment reads 1, segment skipped 0.
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 375 ms, elapsed time = 205 ms.Context
StackExchange Database Administrators Q#233716, answer score: 5
Revisions (0)
No revisions yet.