patternsqlMinor
Returning total number of rows in query
Viewed 0 times
totalrowsnumberqueryreturning
Problem
Oftentimes I find myself wanting the total number of rows returned by a query even though I only may display 50 or so per page. Instead of doing this in multiple queries like so:
It has been recommended to me to do this:
I am just looking for what is better as far as performance and if performance is a wash. Does this really improve readability of SQL? It is certainly cleaner/easier to write.
SELECT first_name,
last_name,
(SELECT count(1) FROM sandbox.PEOPLE WHERE trunc(birthday) = trunc(sysdate) ) as totalRows
FROM sandbox.PEOPLE
WHERE trunc(birthday) = trunc(sysdate);It has been recommended to me to do this:
SELECT first_name,
last_name,
count(*) over () totalRows
FROM sandbox.PEOPLE
WHERE trunc(birthday) = trunc(sysdate);I am just looking for what is better as far as performance and if performance is a wash. Does this really improve readability of SQL? It is certainly cleaner/easier to write.
Solution
The latter query will be much more efficient-- it only requires hitting the table once. You can do a quick test yourself to confirm this.
I'll create a simple two-column table with 1 million rows where the second column is one of 10 distinct values
Now, I'll run two different queries that retrieve 10% of the data. I've set SQL*Plus to not bother displaying the data but to display the query plan and the basic execution statistics. With the first query, note that the query plan shows that Oracle has to access the table twice and then do a sort and aggregate. The query does ~10,000 consistent gets which is a measure of the amount of logical I/O being done (note that this is independent of whether data is cached so it is a much more stable measure-- if you run the same query many times, the consistent gets figure will fluctuate very little)
On the other hand, with the analytic function approach, the query plan shows that we only have to hit the table once. And we only have to do ~1,900 consistent gets-- less than 20% of the logical I/O that the first query had to do.
Now, to be fair, you probably won't cut out 80% of your consistent gets moving to the analytic function approach using this particular query because it is likely that far fewer than 10% of the rows in your
Since this is Code Review, the analytic function approach is much easier to maintain over time because you don't violate the DRY principle and you have to remember to make all the same changes to your inline query that you make in the main query.
I'll create a simple two-column table with 1 million rows where the second column is one of 10 distinct values
SQL> create table t (
2 col1 number,
3 col2 number
4 );
Table created.
SQL> insert into t
2 select level, mod(level,10)
3 from dual
4 connect by level <= 1000000;
1000000 rows created.Now, I'll run two different queries that retrieve 10% of the data. I've set SQL*Plus to not bother displaying the data but to display the query plan and the basic execution statistics. With the first query, note that the query plan shows that Oracle has to access the table twice and then do a sort and aggregate. The query does ~10,000 consistent gets which is a measure of the amount of logical I/O being done (note that this is independent of whether data is cached so it is a much more stable measure-- if you run the same query many times, the consistent gets figure will fluctuate very little)
SQL> set autotrace traceonly;
SQL> select col1
2 ,(select count(*) from t where col2=3)
3 from t
4 where col2 = 3;
100000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3335345748
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 85706 | 2176K| 525 (3)| 00:00:07 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| T | 85706 | 1088K| 525 (3)| 00:00:07 |
|* 3 | TABLE ACCESS FULL | T | 85706 | 2176K| 525 (3)| 00:00:07 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL2"=3)
3 - filter("COL2"=3)
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
32 recursive calls
1 db block gets
10465 consistent gets
0 physical reads
176 redo size
2219528 bytes sent via SQL*Net to client
73850 bytes received via SQL*Net from client
6668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100000 rows processedOn the other hand, with the analytic function approach, the query plan shows that we only have to hit the table once. And we only have to do ~1,900 consistent gets-- less than 20% of the logical I/O that the first query had to do.
SQL> select col1,
2 count(*) over ()
3 from t
4 where col2 = 3;
100000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2291049666
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 85706 | 2176K| 525 (3)| 00:00:07 |
| 1 | WINDOW BUFFER | | 85706 | 2176K| 525 (3)| 00:00:07 |
|* 2 | TABLE ACCESS FULL| T | 85706 | 2176K| 525 (3)| 00:00:07 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL2"=3)
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
1892 consistent gets
0 physical reads
0 redo size
2219510 bytes sent via SQL*Net to client
73850 bytes received via SQL*Net from client
6668 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
100000 rows processedNow, to be fair, you probably won't cut out 80% of your consistent gets moving to the analytic function approach using this particular query because it is likely that far fewer than 10% of the rows in your
PEOPLE table have a birth date of today. The fewer rows you return, the less the performance difference will be.Since this is Code Review, the analytic function approach is much easier to maintain over time because you don't violate the DRY principle and you have to remember to make all the same changes to your inline query that you make in the main query.
Code Snippets
SQL> create table t (
2 col1 number,
3 col2 number
4 );
Table created.
SQL> insert into t
2 select level, mod(level,10)
3 from dual
4 connect by level <= 1000000;
1000000 rows created.SQL> set autotrace traceonly;
SQL> select col1
2 ,(select count(*) from t where col2=3)
3 from t
4 where col2 = 3;
100000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3335345748
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 85706 | 2176K| 525 (3)| 00:00:07 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| T | 85706 | 1088K| 525 (3)| 00:00:07 |
|* 3 | TABLE ACCESS FULL | T | 85706 | 2176K| 525 (3)| 00:00:07 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL2"=3)
3 - filter("COL2"=3)
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
32 recursive calls
1 db block gets
10465 consistent gets
0 physical reads
176 redo size
2219528 bytes sent via SQL*Net to client
73850 bytes received via SQL*Net from client
6668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100000 rows processedSQL> select col1,
2 count(*) over ()
3 from t
4 where col2 = 3;
100000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2291049666
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 85706 | 2176K| 525 (3)| 00:00:07 |
| 1 | WINDOW BUFFER | | 85706 | 2176K| 525 (3)| 00:00:07 |
|* 2 | TABLE ACCESS FULL| T | 85706 | 2176K| 525 (3)| 00:00:07 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL2"=3)
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
1892 consistent gets
0 physical reads
0 redo size
2219510 bytes sent via SQL*Net to client
73850 bytes received via SQL*Net from client
6668 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
100000 rows processedContext
StackExchange Code Review Q#326, answer score: 8
Revisions (0)
No revisions yet.