patternsqlMinor
Multi-Column index not used for index-only scan, but partial index is
Viewed 0 times
multiscancolumnusedbutforpartialindexnotonly
Problem
My question is why a multi-column index is not used for a index only scan, when a partial index with equivalent information (i think) is.
The table:
A content sample:
The query I need to run is:
Now if I use a partial index. The query (eventually) gets executed as a index only scan:
What I do not understand is, why a multi column index of the following form is never used for a index-only scan?
Can't the query visit all consecutive leaves of the btree where flag is
The table:
CREATE TABLE test
(
id INT,
descr TEXT,
flag BOOLEAN
);
INSERT INTO test
SELECT GENERATE_SERIES(1,100000) AS id,
MD5(RANDOM()::TEXT) AS descr,
(RANDOM() < 0.1) AS flag;
SELECT *
FROM test LIMIT 10;A content sample:
id descr flag
1 81978ceb5514461fbad9af1152ad78f6 true
2 cc0aee68ba3e0095cc74d53e8da55fef false
3 689a76e5897d565638f8ddd2d2019b7a true
4 9df03bc2969a6af88cd1d6e0423d0f4c true
5 318983766d11f831e9f0df34606dc908 false
6 198102bb71640a16f28263b7fb56ba2e false
7 9bef7320389db46a8ad88ffa611e81b5 false
8 c1f0d637ee0a985aa7d768a78d2d97b1 false
9 781b4064f721ae3879d95579264b0aba false
10 c4582890bb1e9af430e0f36b50f5e88c falseThe query I need to run is:
SELECT id
FROM test
WHERE flag;Now if I use a partial index. The query (eventually) gets executed as a index only scan:
CREATE INDEX i1
ON test (id) WHERE flag;
QUERY PLAN
Index Only Scan using i1 on test (cost=0.29..354.95 rows=9911 width=4) (actual time=0.120..6.268 rows=9911 loops=1)
Heap Fetches: 9911
Buffers: shared hit=834 read=29
Planning time: 0.806 ms
Execution time: 6.922 msWhat I do not understand is, why a multi column index of the following form is never used for a index-only scan?
CREATE INDEX i2
ON test (flag, id);
QUERY PLAN
Bitmap Heap Scan on test (cost=189.10..1122.21 rows=9911 width=4) (actual time=0.767..5.986 rows=9911 loops=1)
Filter: flag
Heap Blocks: exact=834
Buffers: shared hit=863
-> Bitmap Index Scan on i2 (cost=0.00..186.62 rows=9911 width=0) (actual time=0.669..0.669 rows=9911 loops=1)
Index Cond: (flag = true)
Buffers: shared hit=29
Planning time: 0.090 ms
Execution time: 6.677 msCan't the query visit all consecutive leaves of the btree where flag is
True toSolution
Tuple visibility is the problem. It uses the bitmap index scan in the unvacuumed case because it thinks it will be faster, not because it is incapable of using the IOS. Once I do a
As for why it doesn't use the bitmap scan for the partial index when the table is not vacuumed, that is a bit more complicated and is not about visibility. One of the benefits of bitmap scans (and in your case, the only benefit of them) is that it reads the table in physical order and visits each table block only once, regardless of the order of the index. PostgreSQL sees a perfect correlation between
Now in the case of the index on
VACUUM test on your exact example, it starts using the index only scan on index i2, because now it thinks that the IOS will be faster.As for why it doesn't use the bitmap scan for the partial index when the table is not vacuumed, that is a bit more complicated and is not about visibility. One of the benefits of bitmap scans (and in your case, the only benefit of them) is that it reads the table in physical order and visits each table block only once, regardless of the order of the index. PostgreSQL sees a perfect correlation between
id and the order of the physical table rows, so for the index with id as a leading column (the partial index) it think this benefit does not really matter, as it will read the table in physical order anyway even with a regular index scan. Since bitmap scans have slightly higher CPU cost, it won't do one when it offers no predicted IO advantage. The correlation between the flag column and the physical table order is lower, and so it thinks bitmap index scans will get a benefit when using the index which has flag as its leading column, which happens to be the full index. It thinks this benefit outweighs the CPU cost. If the table was marked all-visible, then the benefits of an index-only-scan would tip the scales back again, but without vacuuming, that doesn't happen.Now in the case of the index on
(flag, id), the leading column is being tested for equality and the only rows selected will have the same value for flag. So what really matters is the correlation with the next column in the index, id, not with the flag column. But PostgreSQL isn't (apparently) clever enough to understand this. Now I am kind of curious how hard it would be to make it understand that.Context
StackExchange Database Administrators Q#212637, answer score: 3
Revisions (0)
No revisions yet.