patternsqlMinor

Finding number of Innodb pages to read

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

numberreadinnodbfindingpages

Problem

Is there any way to find number of pages that needs to read when querying for a set of pks?
Eg., select * from Table where id in (1,2,3,4,5,6);
Now is there a way to find the innodb pages that'll be read for the query?

Solution

There is no reliable way to accurately predict the number of pages that will be read.

One could estimate the number of blocks by assuming an average number of consecutive rows per block. "100" is a simple estimate that often works. So, unless the rows are big, id in (1,2,3,4,5,6) would probably involve one page.

If the ids are scattered all over the place (as is 'always' the case with UUIDs, MD5s, etc), then you could estimate one page per id.

If you mean physical reads from disk, the number is often zero -- because things tend to become cached in the "buffer_pool".

There is a way to find the exact number of reads after running the query.

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS;

Then look at the Handler% and Innodb%read% values. There will be a lot of values (and they change from version to version), so I cannot provide much detail.

If you would like further discussion, provide the version number of MySQL, the output of that SHOW, and SHOW CREATE TABLE.

A table T (a,b,c,d .... z) has clustered index pk (a,b). In disk table T occupies n pages. That can be found using mysql.innodb_table_stats.clustered_index_size of that table. How to estimate the number of pages for a certain value of a?

Since a is potentially not-unique, we must first find out how many rows have a=123:

SELECT COUNT(*)+1 FROM  t WHERE a=123;  -- let's call that a_ct

The "+1" is because:

If there are no rows with a=123, it will still have to probe the table to discover that.

After scanning all the rows with a=123, it must read one more row to realize that it has run out of such rows.

As you noted:

SELECT n_rows / clustered_index_size AS rows_per_block
    FROM mysql.innodb_table_stats
    WHERE database_name = '...'
      AND table_name = '...'

rows_per_block is something of an average, but the value can be much larger or much smaller. In particular, if there are 'big' TEXT, BLOB or JSON columns, rows_per_block will be low. It can even be less than 1. This is due to the big columns being put "off-record".

If you don't fetch the big columns, they don't count in the page-read-count. This an argument against blindly doing SELECT * and in favor of listing the needed columns explicitly.

(I don't know of a way to fix the computation due to off-record pages.)

So the estimate of pages that will need to be read:

(a_ct / rows_per_block) / + 0.5

The +0.5 is because sometimes multiple rows will all be in one block, sometimes will be split across two 'consecutive' blocks.

"Read" means (1) look in cache (the buffer_pool); then, if not present, (2) fetch from disk.

Terminology: You have used the term "page"; I am using the term "block"--referring to InnoDB's 16KB blocks. That is the unit by which all data and index operations operate.

"T occupies n pages" -- This "n" could include free blocks or non-leaf-node blocks of the Data's BTree. And, depending on where n comes form, it may include blocks in secondary indexes. Note: The PRIMARY KEY is "clustered" with the data and occupies 1% (another "Rule of Thumb") extra space in addition to the space taken by the data.

For huge tables, I make this simplifying assumption: All non-leaf nodes of any index (PK or secondary) are fully cached in the buffer_pool. But non-leaf nodes of the data are never cached. This makes it easy to simply focus on the number of blocks needed to read the data (just as your question does).

In reality, some non-leaf nodes will need to be fetched from disk and/or some leaf nodes will be cached. Those two estimation errors sort of cancel each other out. (Another Rule of Thumb.)

Non-"huge" tables in today's production machines tend to be fully cached. Hence no read I/O is needed for SELECTs. Every 'write' operation needs at least one disk write for ACID. That is another, longer, discussion. For now, I will point out that any write operation needs to do roughly the same "read" effort as its first step. And some block writes are delayed/cached in the buffer_pool, thereby not delaying the competition of the write SQL.

Code Snippets

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS;

SELECT COUNT(*)+1 FROM  t WHERE a=123;  -- let's call that a_ct

SELECT n_rows / clustered_index_size AS rows_per_block
    FROM mysql.innodb_table_stats
    WHERE database_name = '...'
      AND table_name = '...'

(a_ct / rows_per_block) / + 0.5

Context

StackExchange Database Administrators Q#303057, answer score: 2

Revisions (0)

No revisions yet.