HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlsqlalchemyModerate

Performance of count(*) in subquery

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
countsubqueryperformance

Problem

Suppose we have the following queries:

1.

SELECT COUNT(*) FROM some_big_table WHERE some_col = 'some_val'


2.

SELECT COUNT(*) FROM ( SELECT * FROM some_big_table WHERE some_col = 'some_val' )


Does any of the previous queries perform better? Or are they the same?

I'm using Postgresql 9.4 but, are there a big difference to another DBMS?

PS: I'm asking this question because SQLAlchemy, a Python based ORM, executes the COUNT operation by default on a subquery, but there is an option to force it to execute the COUNT directly on the query.

Solution

Do you have a reason to believe that they will perform differently? If so, why not test them on your RDBMS and with your data? In general I would say to start with the simplest query, test performance, and only attempt something more complicated (such as option 2) if necessary.

You didn't list an RDBMS so I'll give a SQL Server example. The SQL Server query optimizer doesn't directly work with the SQL that you write. It transforms it into an internal format for optimization. The following image describes the various phases of optimization and is borrowed from Query Optimizer Deep Dive - Part 1 by Paul White:

Here's my test query against the publicly available Adventure Works 2014 database:

SELECT COUNT(*) 
FROM Sales.SalesOrderDetail 
WHERE CarrierTrackingNUmber = '2E53-4802-85'


I can get a representation of the query tree after simplification with the undocumented trace flag 8606:

*** Simplified Tree: ***
        LogOp_GbAgg OUT(COL: Expr1002 ,)

            LogOp_Select

                LogOp_Get TBL: Sales.SalesOrderDetail Sales.SalesOrderDetail TableID=1154103152 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                ScaOp_Comp x_cmpEq

                    ScaOp_Identifier QCOL: [AdventureWorks2014].[Sales].[SalesOrderDetail].CarrierTrackingNumber

                    ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=24) XVAR(nvarchar,Owned,Value=Len,Data = (24,506953514552564850455653))

            AncOp_PrjList 

                AncOp_PrjEl COL: Expr1002 

                    ScaOp_AggFunc stopCount

                        ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=0)


The above is an internal representation of your query plan. The actual steps in that query plan aren't important here. What is important is that I get the exact same simplified tree for this query:

SELECT COUNT(*) 
FROM 
(
    SELECT * 
    FROM Sales.SalesOrderDetail 
    WHERE CarrierTrackingNUmber = '2E53-4802-85'
) t;


Effectively, that means that SQL Server rewrites those two queries to the exact same thing before continuing with optimization. This means that there will not be a performance difference between the two of them. As expected, they have the exact same query plan:

Code Snippets

SELECT COUNT(*) 
FROM Sales.SalesOrderDetail 
WHERE CarrierTrackingNUmber = '2E53-4802-85'
*** Simplified Tree: ***
        LogOp_GbAgg OUT(COL: Expr1002 ,)

            LogOp_Select

                LogOp_Get TBL: Sales.SalesOrderDetail Sales.SalesOrderDetail TableID=1154103152 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                ScaOp_Comp x_cmpEq

                    ScaOp_Identifier QCOL: [AdventureWorks2014].[Sales].[SalesOrderDetail].CarrierTrackingNumber

                    ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=24) XVAR(nvarchar,Owned,Value=Len,Data = (24,506953514552564850455653))

            AncOp_PrjList 

                AncOp_PrjEl COL: Expr1002 

                    ScaOp_AggFunc stopCount

                        ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=0)
SELECT COUNT(*) 
FROM 
(
    SELECT * 
    FROM Sales.SalesOrderDetail 
    WHERE CarrierTrackingNUmber = '2E53-4802-85'
) t;

Context

StackExchange Database Administrators Q#168022, answer score: 11

Revisions (0)

No revisions yet.