gotchaMinor
Why does the CBO eliminate `distinct` and not `group by` in this query?
Viewed 0 times
thisdistinctwhythecbogroupqueryeliminatedoesand
Problem
The CBO chooses to eliminate the
My questions are:
1) Why does it choose to do so in this case - I can't see a reason from the costings and predicted cardinalities in the plans
2) If it chooses to eliminate the
testbed:
fast (55ms):
slow (5240ms):
SQLFiddle here.
distinct in the 'slow' query - presumably it can tell that the operation isn't needed because of the outer group by.My questions are:
1) Why does it choose to do so in this case - I can't see a reason from the costings and predicted cardinalities in the plans
2) If it chooses to eliminate the
distinct, why not apply the same logic and eliminate the group by?testbed:
create table t1 as
select rownum product_id, mod(rownum,3)+1 company_id
from dual
connect by rownum<=500;
create table t2 as
select t1.product_id from t1 t1 cross join t1 t12;
create table t3 as
select distinct company_id from t1;
analyze table t1 compute statistics;
analyze table t2 compute statistics;
analyze table t3 compute statistics;fast (55ms):
select company_id
from t1
join t2 using(product_id)
join ( select company_id
from (select company_id from t1 group by company_id)
join t3 using(company_id) ) using(company_id)
group by company_id;slow (5240ms):
select company_id
from t1
join t2 using(product_id)
join ( select company_id
from (select distinct company_id from t1)
join t3 using(company_id) ) using(company_id)
group by company_id;SQLFiddle here.
Solution
Output is from a 11.2.0.4.6 Enterprise Edition database on Oracle Linux 7.1 x86-64 platform.
Lets start with question 2 and an easy example.
DISTINCT and GROUP BY are handled differently: the optimizer is able to completely eliminate a DISTINCT under certain circumstances, but it can not do the same with GROUP BY. Here is an example:
Note that setting statistics_level to ALL significantly increases the execution time for the original queries in the question.
The 2 queries will be:
This is what we expect. Full table scan and HASH UNIQUE for DISTINCT, HASH GROUP BY for GROUP BY. Now add a NOT NULL constraint and an index.
The optimizer noticed the index on the relevant column, and because of the NOT NULL constraint it is able to use it avoid sorting the data for producing the unique values, because the data is already sorted in the index. Now add a UNIQUE constraint to this column:
```
alter table t4 add constraint t4_u1 unique (product_id) using index t4_i1;
select distinct product_id from t4
Plan hash value: 3974767428
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
----------------------------------------------------------
Lets start with question 2 and an easy example.
DISTINCT and GROUP BY are handled differently: the optimizer is able to completely eliminate a DISTINCT under certain circumstances, but it can not do the same with GROUP BY. Here is an example:
create table t4 as
select rownum product_id
from dual
connect by rownum<=5;
exec dbms_stats.gather_table_stats(user, 'T4');
alter session set statistics_level=all;Note that setting statistics_level to ALL significantly increases the execution time for the original queries in the question.
The 2 queries will be:
select distinct product_id from t4
Plan hash value: 641655586
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| 5 |00:00:00.01 | | | |
| 1 | HASH UNIQUE | | 1 | 5 | 4 (25)| 5 |00:00:00.01 | 2441K| 2441K| 1503K (0)|
| 2 | TABLE ACCESS FULL| T4 | 1 | 5 | 3 (0)| 5 |00:00:00.01 | | | |
-------------------------------------------------------------------------------------------------------------------
select product_id from t4 group by product_id
Plan hash value: 581042373
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| 5 |00:00:00.01 | | | |
| 1 | HASH GROUP BY | | 1 | 5 | 4 (25)| 5 |00:00:00.01 | 2441K| 2441K| 863K (0)|
| 2 | TABLE ACCESS FULL| T4 | 1 | 5 | 3 (0)| 5 |00:00:00.01 | | | |
-------------------------------------------------------------------------------------------------------------------This is what we expect. Full table scan and HASH UNIQUE for DISTINCT, HASH GROUP BY for GROUP BY. Now add a NOT NULL constraint and an index.
create index t4_i1 on t4(product_id);
alter table t4 modify (product_id not null);
select distinct product_id from t4
Plan hash value: 4231414870
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 5 |00:00:00.01 |
| 1 | SORT UNIQUE NOSORT| | 1 | 5 | 2 (50)| 5 |00:00:00.01 |
| 2 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
-----------------------------------------------------------------------------------------
select product_id from t4 group by product_id
Plan hash value: 1989519822
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | SORT GROUP BY NOSORT| | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
| 2 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
-------------------------------------------------------------------------------------------The optimizer noticed the index on the relevant column, and because of the NOT NULL constraint it is able to use it avoid sorting the data for producing the unique values, because the data is already sorted in the index. Now add a UNIQUE constraint to this column:
```
alter table t4 add constraint t4_u1 unique (product_id) using index t4_i1;
select distinct product_id from t4
Plan hash value: 3974767428
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
----------------------------------------------------------
Code Snippets
create table t4 as
select rownum product_id
from dual
connect by rownum<=5;
exec dbms_stats.gather_table_stats(user, 'T4');
alter session set statistics_level=all;select distinct product_id from t4
Plan hash value: 641655586
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| 5 |00:00:00.01 | | | |
| 1 | HASH UNIQUE | | 1 | 5 | 4 (25)| 5 |00:00:00.01 | 2441K| 2441K| 1503K (0)|
| 2 | TABLE ACCESS FULL| T4 | 1 | 5 | 3 (0)| 5 |00:00:00.01 | | | |
-------------------------------------------------------------------------------------------------------------------
select product_id from t4 group by product_id
Plan hash value: 581042373
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| 5 |00:00:00.01 | | | |
| 1 | HASH GROUP BY | | 1 | 5 | 4 (25)| 5 |00:00:00.01 | 2441K| 2441K| 863K (0)|
| 2 | TABLE ACCESS FULL| T4 | 1 | 5 | 3 (0)| 5 |00:00:00.01 | | | |
-------------------------------------------------------------------------------------------------------------------create index t4_i1 on t4(product_id);
alter table t4 modify (product_id not null);
select distinct product_id from t4
Plan hash value: 4231414870
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 5 |00:00:00.01 |
| 1 | SORT UNIQUE NOSORT| | 1 | 5 | 2 (50)| 5 |00:00:00.01 |
| 2 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
-----------------------------------------------------------------------------------------
select product_id from t4 group by product_id
Plan hash value: 1989519822
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | SORT GROUP BY NOSORT| | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
| 2 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
-------------------------------------------------------------------------------------------alter table t4 add constraint t4_u1 unique (product_id) using index t4_i1;
select distinct product_id from t4
Plan hash value: 3974767428
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
---------------------------------------------------------------------------------------
select product_id from t4 group by product_id
Plan hash value: 1989519822
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | SORT GROUP BY NOSORT| | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
| 2 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
-------------------------------------------------------------------------------------------alter system flush shared_pool;
alter session set events '10053 trace name context forever, level 1';
select distinct product_id from t4;Context
StackExchange Database Administrators Q#101364, answer score: 3
Revisions (0)
No revisions yet.