patternsqlsqlalchemyModerate
Performance of count(*) in subquery
Viewed 0 times
countsubqueryperformance
Problem
Suppose we have the following queries:
1.
2.
Does any of the previous queries perform better? Or are they the same?
I'm using Postgresql 9.4 but, are there a big difference to another DBMS?
PS: I'm asking this question because SQLAlchemy, a Python based ORM, executes the
1.
SELECT COUNT(*) FROM some_big_table WHERE some_col = 'some_val'2.
SELECT COUNT(*) FROM ( SELECT * FROM some_big_table WHERE some_col = 'some_val' )Does any of the previous queries perform better? Or are they the same?
I'm using Postgresql 9.4 but, are there a big difference to another DBMS?
PS: I'm asking this question because SQLAlchemy, a Python based ORM, executes the
COUNT operation by default on a subquery, but there is an option to force it to execute the COUNT directly on the query.Solution
Do you have a reason to believe that they will perform differently? If so, why not test them on your RDBMS and with your data? In general I would say to start with the simplest query, test performance, and only attempt something more complicated (such as option 2) if necessary.
You didn't list an RDBMS so I'll give a SQL Server example. The SQL Server query optimizer doesn't directly work with the SQL that you write. It transforms it into an internal format for optimization. The following image describes the various phases of optimization and is borrowed from Query Optimizer Deep Dive - Part 1 by Paul White:
Here's my test query against the publicly available Adventure Works 2014 database:
I can get a representation of the query tree after simplification with the undocumented trace flag 8606:
The above is an internal representation of your query plan. The actual steps in that query plan aren't important here. What is important is that I get the exact same simplified tree for this query:
Effectively, that means that SQL Server rewrites those two queries to the exact same thing before continuing with optimization. This means that there will not be a performance difference between the two of them. As expected, they have the exact same query plan:
You didn't list an RDBMS so I'll give a SQL Server example. The SQL Server query optimizer doesn't directly work with the SQL that you write. It transforms it into an internal format for optimization. The following image describes the various phases of optimization and is borrowed from Query Optimizer Deep Dive - Part 1 by Paul White:
Here's my test query against the publicly available Adventure Works 2014 database:
SELECT COUNT(*)
FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNUmber = '2E53-4802-85'I can get a representation of the query tree after simplification with the undocumented trace flag 8606:
*** Simplified Tree: ***
LogOp_GbAgg OUT(COL: Expr1002 ,)
LogOp_Select
LogOp_Get TBL: Sales.SalesOrderDetail Sales.SalesOrderDetail TableID=1154103152 TableReferenceID=0 IsRow: COL: IsBaseRow1000
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [AdventureWorks2014].[Sales].[SalesOrderDetail].CarrierTrackingNumber
ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=24) XVAR(nvarchar,Owned,Value=Len,Data = (24,506953514552564850455653))
AncOp_PrjList
AncOp_PrjEl COL: Expr1002
ScaOp_AggFunc stopCount
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=0)The above is an internal representation of your query plan. The actual steps in that query plan aren't important here. What is important is that I get the exact same simplified tree for this query:
SELECT COUNT(*)
FROM
(
SELECT *
FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNUmber = '2E53-4802-85'
) t;Effectively, that means that SQL Server rewrites those two queries to the exact same thing before continuing with optimization. This means that there will not be a performance difference between the two of them. As expected, they have the exact same query plan:
Code Snippets
SELECT COUNT(*)
FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNUmber = '2E53-4802-85'*** Simplified Tree: ***
LogOp_GbAgg OUT(COL: Expr1002 ,)
LogOp_Select
LogOp_Get TBL: Sales.SalesOrderDetail Sales.SalesOrderDetail TableID=1154103152 TableReferenceID=0 IsRow: COL: IsBaseRow1000
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [AdventureWorks2014].[Sales].[SalesOrderDetail].CarrierTrackingNumber
ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=24) XVAR(nvarchar,Owned,Value=Len,Data = (24,506953514552564850455653))
AncOp_PrjList
AncOp_PrjEl COL: Expr1002
ScaOp_AggFunc stopCount
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=0)SELECT COUNT(*)
FROM
(
SELECT *
FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNUmber = '2E53-4802-85'
) t;Context
StackExchange Database Administrators Q#168022, answer score: 11
Revisions (0)
No revisions yet.