patternsqlCritical
Understanding "bitmap heap scan" and "bitmap index scan"
Viewed 0 times
understandingscanheapbitmapandindex
Problem
I'll try to explain my misunderstandings by the following example.
I didn't understand fundamentals of the
I didn't understand fundamentals of the
Bitmap Heap Scan Node. Consider the query SELECT customerid, username FROM customers WHERE customerid bitmap index scan is like a sequential scan but only of the appropriate part of the table?Solution
How does PostgreSQL knows by just a bitmap anything about rows' physical order?
The bitmap is one bit per heap page. The bitmap index scan sets the bits based on the heap page address that the index entry points to.
So when it goes to do the bitmap heap scan, it just does a linear table scan, reading the bitmap to see whether it should bother with a particular page or seek over it.
Or generates the bitmap so that any element of it can be mapped to the pointer to a page easily?
No, the bitmap corresponds 1:1 to heap pages.
I wrote some more on this here.
OK, it looks like you might be misunderstanding what "bitmap" means in this context.
It's not a bit string like "101011" that's created for each heap page, or each index read, or whatever.
The whole bitmap is a single bit array, with as many bits as there are heap pages in the relation being scanned.
One bitmap is created by the first index scan, starting off with all entries 0 (false). Whenever an index entry that matches the search condition is found, the heap address pointed to by that index entry is looked up as an offset into the bitmap, and that bit is set to 1 (true). So rather than looking up the heap page directly, the bitmap index scan looks up the corresponding bit position in the bitmap.
The second and further bitmap index scans do the same thing with the other indexes and the search conditions on them.
Then each bitmap is ANDed together. The resulting bitmap has one bit for each heap page, where the bits are true only if they were true in all the individual bitmap index scans, i.e. the search condition matched for every index scan. These are the only heap pages we need to bother to load and examine. Since each heap page might contain multiple rows, we then have to examine each row to see if it matches all the conditions - that's what the "recheck cond" part is about.
One crucial thing to understand with all this is that the tuple address in an index entry points to the row's
Graphical example
Once the bitmaps are created a bitwise AND is performed on them:
The bitmap heap scan then seeks to the start of each page and reads the page:
and each read page is then re-checked against the condition since
there can be >1 row per page and not all necessarily match
the condition.
The bitmap is one bit per heap page. The bitmap index scan sets the bits based on the heap page address that the index entry points to.
So when it goes to do the bitmap heap scan, it just does a linear table scan, reading the bitmap to see whether it should bother with a particular page or seek over it.
Or generates the bitmap so that any element of it can be mapped to the pointer to a page easily?
No, the bitmap corresponds 1:1 to heap pages.
I wrote some more on this here.
OK, it looks like you might be misunderstanding what "bitmap" means in this context.
It's not a bit string like "101011" that's created for each heap page, or each index read, or whatever.
The whole bitmap is a single bit array, with as many bits as there are heap pages in the relation being scanned.
One bitmap is created by the first index scan, starting off with all entries 0 (false). Whenever an index entry that matches the search condition is found, the heap address pointed to by that index entry is looked up as an offset into the bitmap, and that bit is set to 1 (true). So rather than looking up the heap page directly, the bitmap index scan looks up the corresponding bit position in the bitmap.
The second and further bitmap index scans do the same thing with the other indexes and the search conditions on them.
Then each bitmap is ANDed together. The resulting bitmap has one bit for each heap page, where the bits are true only if they were true in all the individual bitmap index scans, i.e. the search condition matched for every index scan. These are the only heap pages we need to bother to load and examine. Since each heap page might contain multiple rows, we then have to examine each row to see if it matches all the conditions - that's what the "recheck cond" part is about.
One crucial thing to understand with all this is that the tuple address in an index entry points to the row's
ctid, which is a combination of the heap page number and the offset within the heap page. A bitmap index scan ignores the offsets, since it'll check the whole page anyway, and sets the bit if any row on that page matches the condition.Graphical example
Heap, one square = one page:
+---------------------------------------------+
|c____u_____X___u___X_________u___cXcc______u_|
+---------------------------------------------+
Rows marked c match customers pkey condition.
Rows marked u match username condition.
Rows marked X match both conditions.
Bitmap scan from customers_pkey:
+---------------------------------------------+
|100000000001000000010000000000000111100000000| bitmap 1
+---------------------------------------------+
One bit per heap page, in the same order as the heap
Bits 1 when condition matches, 0 if not
Bitmap scan from ix_cust_username:
+---------------------------------------------+
|000001000001000100010000000001000010000000010| bitmap 2
+---------------------------------------------+
Once the bitmaps are created a bitwise AND is performed on them:
+---------------------------------------------+
|100000000001000000010000000000000111100000000| bitmap 1
|000001000001000100010000000001000010000000010| bitmap 2
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
|000000000001000000010000000000000010000000000| Combined bitmap
+-----------+-------+--------------+----------+
| | |
v v v
Used to scan the heap only for matching pages:
+---------------------------------------------+
|___________X_______X______________X__________|
+---------------------------------------------+The bitmap heap scan then seeks to the start of each page and reads the page:
+---------------------------------------------+
|___________X_______X______________X__________|
+---------------------------------------------+
seek------->^seek-->^seek--------->^
| | |
------------------------
only these pages readand each read page is then re-checked against the condition since
there can be >1 row per page and not all necessarily match
the condition.
Code Snippets
+---------------------------------------------+
|100000000001000000010000000000000111100000000| bitmap 1
|000001000001000100010000000001000010000000010| bitmap 2
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
|000000000001000000010000000000000010000000000| Combined bitmap
+-----------+-------+--------------+----------+
| | |
v v v
Used to scan the heap only for matching pages:
+---------------------------------------------+
|___________X_______X______________X__________|
+---------------------------------------------++---------------------------------------------+
|___________X_______X______________X__________|
+---------------------------------------------+
seek------->^seek-->^seek--------->^
| | |
------------------------
only these pages readContext
StackExchange Database Administrators Q#119386, answer score: 106
Revisions (0)
No revisions yet.