patternsqlMinor
Why doesn't an OFFSET query benefit more from the index?
Viewed 0 times
whytheoffsetbenefitquerymoredoesnindexfrom
Problem
I've got a simple example
I see a query plan like:
Do I understand correctly that PostgreSQL is loading
I understand that the typical solution to this is to use keyset pagination - to say
books table with an integer id primary key ("books_pkey" PRIMARY KEY, btree (id)) and 100,000,000 random rows. If I run:EXPLAIN
SELECT *
FROM books
ORDER BY id
OFFSET 99999999
LIMIT 1;I see a query plan like:
Limit (cost=3137296.54..3137296.57 rows=1 width=14)
-> Index Scan using books_pkey on books (cost=0.57..3137296.57 rows=100000000 width=14)Do I understand correctly that PostgreSQL is loading
100000000 rows into memory, only for the OFFSET to discard all but 1? If so, why can't it do the "load and discard" step using the index and only load one row into memory?I understand that the typical solution to this is to use keyset pagination - to say
WHERE id > x. I'm just trying to understand why an index alone doesn't solve this. Adding another index which is explicitly sorted the same way as this query (CREATE INDEX books_id_ordered ON books (id ASC)) makes no difference to EXPLAIN.Solution
A typical index stores the data in a B-Tree data structure (as even evident from your index definition). A B-Tree data structure, while sorted, is not enumerated (as mentioned by Akina) and therefore the data is not stored in a linear fashion, by design. A linear data structure (such as an enumerated list) would actually be a less efficient data structure for indexing data in most cases, since it has a Big O search (seek) notation of O(n) whereas a B-Tree is O(log(n)).
So simply put, because of the traditional algorithm of a B-Tree data structure, there is no way for the database system to process a
Could modern database systems be improved to be more robust and maintain an enumerated linear data structure, whenever an index is created, in addition to the B-Tree to improve this specific use case of
So simply put, because of the traditional algorithm of a B-Tree data structure, there is no way for the database system to process a
LIMIT and OFFSET clause without scanning the entire data first (within the filters of your predicates of course), such that it can correctly apply those clauses.Could modern database systems be improved to be more robust and maintain an enumerated linear data structure, whenever an index is created, in addition to the B-Tree to improve this specific use case of
LIMIT and OFFSET clauses?...Yes, but at the tradeoff of consuming extra storage space for a copy of the data, and even worse off, having to deal with the extra work required for applying changes to the linear data structure (in addition to the B-Tree) as the Table's data changes.Context
StackExchange Database Administrators Q#298031, answer score: 3
Revisions (0)
No revisions yet.