HiveBrain v1.2.0
Get Started
← Back to all entries
gotchaMinor

Why does SAMPLE BLOCK often return zero?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whysamplereturnblockoftendoeszero

Problem

In a comment on another question, someone asked:


I used SAMPLE BLOCK in many tests but found that it would inexplicably return '0' much of the time. Any idea why?

I speculated that this might be to do with the sample hitting empty blocks but even if you tune the table to make sure all the extents allocated are completely full, the behaviour doesn't change, so it can't be that.

test data:

create table foo(bar integer primary key, baz char(1000));





insert into foo(bar,baz) select level, 'A' from dual connect by level<=1000;





begin
  dbms_stats.gather_schema_stats(null);
end;





select sum(blocks) blocks from user_extents where segment_name = 'FOO';


| BLOCKS |
| -----: |
| 256 |

test SAMPLE BLOCK:

with w(estimated_rows,distinct_blocks) as(
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50))
select * from w;


ESTIMATED_ROWS | DISTINCT_BLOCKS
-------------: | --------------:
0 | 0
0 | 0
2000 | 143
2000 | 143
2000 | 143

dbfiddle here

I found a Jonathan Lewis blog on a related issue in 10.1, but this didn't explain why it is happening or how to work around the issue.

Solution

There doesn't seem to be anything specific mentioned in the documentation, but this Oracle Data Mining blog is interesting:


SAMPLE relies on the existence of primary keys

I'm not sure this is true, but the existence of a primary key does play a role. The docs do say:


Block sampling is possible only during full table scans or index fast full scans. If a more efficient execution path exists, then Oracle Database does not perform block sampling. If you want to guarantee block sampling for a particular table or index, then use the FULL or INDEX_FFS hint.

If you look at the plan for the query you are using, you can see it never touches the table, instead it is doing a fast full scan of the primary key:

select * from table(dbms_xplan.display);


| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------------------------- |
| Plan hash value: 130955250 |
| |
| ---------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | 45 | 3 (34)| 00:00:01 | |
| | 1 | SORT AGGREGATE | | 1 | 45 | | | |
| | 2 | VIEW | VW_DAG_0 | 500 | 22500 | 3 (34)| 00:00:01 | |
| | 3 | HASH GROUP BY | | 500 | 6000 | 3 (34)| 00:00:01 | |
| | 4 | INDEX SAMPLE FAST FULL SCAN| SYS_C007394 | 500 | 6000 | 2 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------------------------- |

The SAMPLE BLOCK is operating on the index — which has far fewer blocks and is distorting the sample:

select sum(blocks) blocks from user_extents where segment_name = 'PK_FOO';


| BLOCKS |
| -----: |
| 8 |

With a primary key with more blocks the variability is much less, and you can alleviate the issue even more by using the second parameter in SAMPLE BLOCK to restrict contiguous blocks sampled to 1:

create table foo(bar integer, baz char(1000), constraint pk_foo primary key (bar,baz));







select segment_name, sum(blocks) blocks
from user_extents
where segment_name like '%FOO'
group by segment_name;


SEGMENT_NAME | BLOCKS
:----------- | -----:
FOO | 256
PK_FOO | 256

with w(estimated_rows,distinct_blocks) as(
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1))
select * from w;


ESTIMATED_ROWS | DISTINCT_BLOCKS
-------------: | --------------:
1076 | 77
896 | 64
924 | 66
1006 | 72
810 | 58

dbfiddle here

This isn't a full explanation, and it might not be possible to have one without knowing how Oracle does the sample internally. What is clear is that whatever Oracle is sampling from, either table or index, needs to be big to iron out the sampling wrinkles.

Code Snippets

select * from table(dbms_xplan.display);
select sum(blocks) blocks from user_extents where segment_name = 'PK_FOO';
create table foo(bar integer, baz char(1000), constraint pk_foo primary key (bar,baz));
select segment_name, sum(blocks) blocks
from user_extents
where segment_name like '%FOO'
group by segment_name;
with w(estimated_rows,distinct_blocks) as(
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1)
 union all
 select count(*)*2, count(distinct substr(rowid,1,15)) from foo sample block (50,1))
select * from w;

Context

StackExchange Database Administrators Q#166495, answer score: 3

Revisions (0)

No revisions yet.