snippetMinor
Bloom Filter about filtering in oracle
Viewed 0 times
bloomfilteraboutoraclefiltering
Problem
I read a great blog about bloom filter in Oracle, and I have a few questions about the Bloom filter use case - Filtering in this blog.
The execution plan is as follows:
```
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS ALL'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
SQL_ID 587ty27pjwphu, child number 0
-------------------------------------
SELECT /+ parallel(8) / count(*) FROM FACT, DIMENSION WHERE DIMENSION.col1 = 1 AND DIMENSION.col2 = FACT.col2
Plan hash value: 4106007966
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | TQ |IN-OUT| PQ Distrib | A-Rows |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | | | | 1 |
| 2 | PX COORDINATOR | | 1 | | | |
SQL> CREATE TABLE DIMENSION
(
col1,
col2
) AS
SELECT MOD( ROWNUM, 10 ),
ROWNUM
FROM DUAL
CONNECT BY ROWNUM CREATE TABLE FACT
(
col1,
col2
) AS
SELECT MOD( ROWNUM, 25 ),
ROWNUM
FROM DUAL
CONNECT BY ROWNUM exec dbms_stats.gather_table_stats(USER, 'DIMENSION');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(USER, 'FACT');
PL/SQL procedure successfully completed.
SQL> SELECT /*+ parallel(8) */ count( * )
FROM FACT,
DIMENSION
WHERE DIMENSION.col1 = 1 AND
DIMENSION.col2 = FACT.col2;
COUNT(*)
----------
100The execution plan is as follows:
```
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS ALL'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
SQL_ID 587ty27pjwphu, child number 0
-------------------------------------
SELECT /+ parallel(8) / count(*) FROM FACT, DIMENSION WHERE DIMENSION.col1 = 1 AND DIMENSION.col2 = FACT.col2
Plan hash value: 4106007966
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | TQ |IN-OUT| PQ Distrib | A-Rows |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | | | | 1 |
| 2 | PX COORDINATOR | | 1 | | | |
Solution
A good way to observe what Oracle is actually doing with parallel processes is to use the Real-Time SQL Monitoring feature in SQL Developer. Note that this requires the target database to be licensed for the Oracle Tuning pack (the software also warns you of this).
Let's start by looking at a SQL Monitor report for your query with a NO_PARALLEL hint:
As expected Oracle gets back all 1 million rows from the FACT table. You can tell by looking at the column for the actual number of rows.
Now let's run it with a PARALLEL(4) hint:
The actual number of rows fetched from the FACT table is just 10 due to the bloom filter. This tool can show also show you how many different data flow operations there are as well as information about parallel slave sets. For this query there's just one data flow operation. You can map back parts of the query to the data flow operations by paying attention to the color of the person icon on the left of the operation column. There's also a tab with more information about the data flow operations:
In this example, lines 3-10 were run in parallel by a single data flow operation.
For the parallel slave sets, you can observe this by looking at the name column. Here there is just
Let's try another test but with 100X the data as before. I used the same table structures as in the previous example. Here is a screenshot of the finished SQL Monitor report:
By using the timeline you can see which operations were executed first and which were executed concurrently. You can also see a difference in the actual rows on line 10 (1135) with the actual rows on line 5 (1000). This means that the bloom filter allowed 135 false positive rows to make it to the hash join.
Going back to your original questions to answer them directly:
Does the bloom filter prevents all rows or some rows form table FACT that do not join table DIMENSION from being needlessly distributed?
It depends on the queries and the data in the tables referenced by the queries. Sometimes the bloom filter will filter out all unnecessary rows and sometimes it will only filter out some unnecessary rows.
Is the bloom filter created in parallel by the set of parallel execution processes that scanned table DIMENSION or the set of parallel execution processes that later perform hash join?
In this example query the same set of parallel slave processes does all of the work. For expensive enough queries you can observe the order of operations using the timeline.
Let's start by looking at a SQL Monitor report for your query with a NO_PARALLEL hint:
As expected Oracle gets back all 1 million rows from the FACT table. You can tell by looking at the column for the actual number of rows.
Now let's run it with a PARALLEL(4) hint:
The actual number of rows fetched from the FACT table is just 10 due to the bloom filter. This tool can show also show you how many different data flow operations there are as well as information about parallel slave sets. For this query there's just one data flow operation. You can map back parts of the query to the data flow operations by paying attention to the color of the person icon on the left of the operation column. There's also a tab with more information about the data flow operations:
In this example, lines 3-10 were run in parallel by a single data flow operation.
For the parallel slave sets, you can observe this by looking at the name column. Here there is just
:TQ10000 which corresponds to Q01,00. You can check the timeline of the query to get an idea of when Oracle performed certain operations, but Oracle completes the query too quickly for that to happen here.Let's try another test but with 100X the data as before. I used the same table structures as in the previous example. Here is a screenshot of the finished SQL Monitor report:
By using the timeline you can see which operations were executed first and which were executed concurrently. You can also see a difference in the actual rows on line 10 (1135) with the actual rows on line 5 (1000). This means that the bloom filter allowed 135 false positive rows to make it to the hash join.
Going back to your original questions to answer them directly:
Does the bloom filter prevents all rows or some rows form table FACT that do not join table DIMENSION from being needlessly distributed?
It depends on the queries and the data in the tables referenced by the queries. Sometimes the bloom filter will filter out all unnecessary rows and sometimes it will only filter out some unnecessary rows.
Is the bloom filter created in parallel by the set of parallel execution processes that scanned table DIMENSION or the set of parallel execution processes that later perform hash join?
In this example query the same set of parallel slave processes does all of the work. For expensive enough queries you can observe the order of operations using the timeline.
Context
StackExchange Database Administrators Q#159650, answer score: 3
Revisions (0)
No revisions yet.