gotchaMinor
Why does SAMPLE BLOCK often return zero?
Viewed 0 times
whysamplereturnblockoftendoeszero
Problem
In a comment on another question, someone asked:
I used SAMPLE BLOCK in many tests but found that it would inexplicably return '0' much of the time. Any idea why?
I speculated that this might be to do with the sample hitting empty blocks but even if you tune the table to make sure all the extents allocated are completely full, the behaviour doesn't change, so it can't be that.
test data:
| BLOCKS |
| -----: |
| 256 |
test SAMPLE BLOCK:
ESTIMATED_ROWS | DISTINCT_BLOCKS
-------------: | --------------:
0 | 0
0 | 0
2000 | 143
2000 | 143
2000 | 143
dbfiddle here
I found a Jonathan Lewis blog on a related issue in 10.1, but this didn't explain why it is happening or how to work around the issue.
I used SAMPLE BLOCK in many tests but found that it would inexplicably return '0' much of the time. Any idea why?
I speculated that this might be to do with the sample hitting empty blocks but even if you tune the table to make sure all the extents allocated are completely full, the behaviour doesn't change, so it can't be that.
test data:
create table foo(bar integer primary key, baz char(1000));insert into foo(bar,baz) select level, 'A' from dual connect by level<=1000;begin
dbms_stats.gather_schema_stats(null);
end;select sum(blocks) blocks from user_extents where segment_name = 'FOO';| BLOCKS |
| -----: |
| 256 |
test SAMPLE BLOCK:
with w(estimated_rows,distinct_blocks) as(
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50))
select * from w;ESTIMATED_ROWS | DISTINCT_BLOCKS
-------------: | --------------:
0 | 0
0 | 0
2000 | 143
2000 | 143
2000 | 143
dbfiddle here
I found a Jonathan Lewis blog on a related issue in 10.1, but this didn't explain why it is happening or how to work around the issue.
Solution
There doesn't seem to be anything specific mentioned in the documentation, but this Oracle Data Mining blog is interesting:
SAMPLE relies on the existence of primary keys
I'm not sure this is true, but the existence of a primary key does play a role. The docs do say:
Block sampling is possible only during full table scans or index fast full scans. If a more efficient execution path exists, then Oracle Database does not perform block sampling. If you want to guarantee block sampling for a particular table or index, then use the FULL or INDEX_FFS hint.
If you look at the plan for the query you are using, you can see it never touches the table, instead it is doing a fast full scan of the primary key:
| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------------------------- |
| Plan hash value: 130955250 |
| |
| ---------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | 45 | 3 (34)| 00:00:01 | |
| | 1 | SORT AGGREGATE | | 1 | 45 | | | |
| | 2 | VIEW | VW_DAG_0 | 500 | 22500 | 3 (34)| 00:00:01 | |
| | 3 | HASH GROUP BY | | 500 | 6000 | 3 (34)| 00:00:01 | |
| | 4 | INDEX SAMPLE FAST FULL SCAN| SYS_C007394 | 500 | 6000 | 2 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------------------------- |
The SAMPLE BLOCK is operating on the index — which has far fewer blocks and is distorting the sample:
| BLOCKS |
| -----: |
| 8 |
With a primary key with more blocks the variability is much less, and you can alleviate the issue even more by using the second parameter in SAMPLE BLOCK to restrict contiguous blocks sampled to 1:
SEGMENT_NAME | BLOCKS
:----------- | -----:
FOO | 256
PK_FOO | 256
ESTIMATED_ROWS | DISTINCT_BLOCKS
-------------: | --------------:
1076 | 77
896 | 64
924 | 66
1006 | 72
810 | 58
dbfiddle here
This isn't a full explanation, and it might not be possible to have one without knowing how Oracle does the sample internally. What is clear is that whatever Oracle is sampling from, either table or index, needs to be big to iron out the sampling wrinkles.
SAMPLE relies on the existence of primary keys
I'm not sure this is true, but the existence of a primary key does play a role. The docs do say:
Block sampling is possible only during full table scans or index fast full scans. If a more efficient execution path exists, then Oracle Database does not perform block sampling. If you want to guarantee block sampling for a particular table or index, then use the FULL or INDEX_FFS hint.
If you look at the plan for the query you are using, you can see it never touches the table, instead it is doing a fast full scan of the primary key:
select * from table(dbms_xplan.display);| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------------------------- |
| Plan hash value: 130955250 |
| |
| ---------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | 45 | 3 (34)| 00:00:01 | |
| | 1 | SORT AGGREGATE | | 1 | 45 | | | |
| | 2 | VIEW | VW_DAG_0 | 500 | 22500 | 3 (34)| 00:00:01 | |
| | 3 | HASH GROUP BY | | 500 | 6000 | 3 (34)| 00:00:01 | |
| | 4 | INDEX SAMPLE FAST FULL SCAN| SYS_C007394 | 500 | 6000 | 2 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------------------------- |
The SAMPLE BLOCK is operating on the index — which has far fewer blocks and is distorting the sample:
select sum(blocks) blocks from user_extents where segment_name = 'PK_FOO';| BLOCKS |
| -----: |
| 8 |
With a primary key with more blocks the variability is much less, and you can alleviate the issue even more by using the second parameter in SAMPLE BLOCK to restrict contiguous blocks sampled to 1:
create table foo(bar integer, baz char(1000), constraint pk_foo primary key (bar,baz));…select segment_name, sum(blocks) blocks
from user_extents
where segment_name like '%FOO'
group by segment_name;SEGMENT_NAME | BLOCKS
:----------- | -----:
FOO | 256
PK_FOO | 256
with w(estimated_rows,distinct_blocks) as(
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1))
select * from w;ESTIMATED_ROWS | DISTINCT_BLOCKS
-------------: | --------------:
1076 | 77
896 | 64
924 | 66
1006 | 72
810 | 58
dbfiddle here
This isn't a full explanation, and it might not be possible to have one without knowing how Oracle does the sample internally. What is clear is that whatever Oracle is sampling from, either table or index, needs to be big to iron out the sampling wrinkles.
Code Snippets
select * from table(dbms_xplan.display);select sum(blocks) blocks from user_extents where segment_name = 'PK_FOO';create table foo(bar integer, baz char(1000), constraint pk_foo primary key (bar,baz));select segment_name, sum(blocks) blocks
from user_extents
where segment_name like '%FOO'
group by segment_name;with w(estimated_rows,distinct_blocks) as(
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
union all
select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1))
select * from w;Context
StackExchange Database Administrators Q#166495, answer score: 3
Revisions (0)
No revisions yet.