HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Why are multiple COUNT faster than one SUM with CASE?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
casewhyarewiththanfasteronemultiplesumcount

Problem

I wanted to know which of the following two approaches is faster:

1) Three COUNT:

SELECT Approved = (SELECT COUNT(*) FROM dbo.Claims d
                  WHERE d.Status = 'Approved'),
        Valid    = (SELECT COUNT(*) FROM dbo.Claims d
                    WHERE d.Status = 'Valid'),
        Reject   = (SELECT COUNT(*) FROM dbo.Claims d
                    WHERE d.Status = 'Reject')


2) SUM with FROM-clause:

SELECT  Approved = SUM(CASE WHEN Status = 'Approved' THEN 1 ELSE 0 END),
        Valid    = SUM(CASE WHEN Status = 'Valid'    THEN 1 ELSE 0 END),
        Reject   = SUM(CASE WHEN Status = 'Reject'   THEN 1 ELSE 0 END)
FROM dbo.Claims c;


I was suprised that the difference is so large. The first query with three subqueries returns the result immediately whereas the second SUM approach needs 18 seconds.

Claims is a view which selects from a table containing ~18 million rows. There is an index on the FK-Column to the ClaimStatus table which contains the status-name.

Why does it make such a great difference whether i use COUNT or SUM?

Execution-plans:

  • COUNT



  • SUM



There are 12 statuses in total. Those three statuses belong to 7% of all rows.

This is the actual view, i'm not sure if it's relevant:

```
CREATE VIEW [dbo].[Claims]
AS
SELECT
mu.Marketunitname AS MarketUnit,
c.Countryname AS Country,
gsp.Gspname AS GSP,
gsp.Wcmskeynumber AS GspNumber,
sl.Slname AS SL,
sl.Wcmskeynumber AS SlNumber,
m.Modelname AS Model,
m.Salesname AS [Model-Salesname],
s.Claimstatusname AS [Status],
d.Work_order AS [Work Order],
d.Ssn_number AS IMEI,
d.Ssn_out,
Remarks,
d.Claimnumber AS [Claim-Number],
d.Rma_number AS [RMA-Number],
dbo.ToShortDateString(d.Received_Date, 1) AS [Received Date],
Iddata,
Fisl,
Fimodel,
Ficlaimstatus
FROM Tabdata AS d
INNER JOIN Locsl AS sl
ON d.Fisl = sl.Idsl
INNE

Solution

The COUNT(*) version is able to simply seek into the index you have on the status column once for each status you are selecting, whereas the SUM(...) version needs to seek the index twelve times (the total number of unique status types).

Clearly seeking an index three times is going to be faster than seeking it 12 times.

The first plan requires a memory grant of 238MB, whereas the second plan requires a memory grant of 650MB. It may be that the larger memory grant could not be immediately filled, making the query that much slower.

Alter the second query to be:

SELECT  Approved = SUM(CASE WHEN Status = 'Approved' THEN 1 ELSE 0 END),
        Valid    = SUM(CASE WHEN Status = 'Valid'    THEN 1 ELSE 0 END),
        Reject   = SUM(CASE WHEN Status = 'Reject'   THEN 1 ELSE 0 END)
FROM dbo.Claims c
WHERE c.Status = 'Approved'
    OR c.Status = 'Valid'
    OR c.Status = 'Reject';


This will allow the query optimizer to eliminate 75% of the index seeks, and should result in both a lower required memory grant, lower I/O requirements, and faster time-to-result.

The SUM(CASE WHEN ...) construct essentially prevents the query optimizer from pushing the Status predicates down into the index seek portion of the plan.

Code Snippets

SELECT  Approved = SUM(CASE WHEN Status = 'Approved' THEN 1 ELSE 0 END),
        Valid    = SUM(CASE WHEN Status = 'Valid'    THEN 1 ELSE 0 END),
        Reject   = SUM(CASE WHEN Status = 'Reject'   THEN 1 ELSE 0 END)
FROM dbo.Claims c
WHERE c.Status = 'Approved'
    OR c.Status = 'Valid'
    OR c.Status = 'Reject';

Context

StackExchange Database Administrators Q#119945, answer score: 20

Revisions (0)

No revisions yet.