snippetsqlModerate
How to speed up ORDER BY sorting when using GIN index in PostgreSQL?
Viewed 0 times
postgresqlsortingorderhowginusingwhenindexspeed
Problem
I have a table like this:
A product can belong to multiple categories.
Typical query looks like this (always searching for single category):
To speed it up I use the following index:
This one helps a lot unless there are too many products in one category. It quickly filters out products that belong to that category but then there is a sort operation that has to be done the hard way (without index).
A have installed
But Postgres does not want to use that for sorting. Even when I remove
Any alternative approaches to optimize the task are very welcome.
Additional information:
The following query plan was obtained from smaller test system (4680 products in selected category, 200k products total in the table):
```
Limit (cost=948.99..948.99 rows=1 width=72) (actual time=82.330..82.341 rows=20 loops=1)
-> Sort (cost=948.37..948.99 rows=245 width=72) (actual time=80.231..81.337 rows=4020 loops=1)
Sort Key: score, title
Sort M
CREATE TABLE products (
id serial PRIMARY KEY,
category_ids integer[],
published boolean NOT NULL,
score integer NOT NULL,
title varchar NOT NULL);A product can belong to multiple categories.
category_ids column holds a list of ids of all product's categories.Typical query looks like this (always searching for single category):
SELECT * FROM products WHERE published
AND category_ids @> ARRAY[23465]
ORDER BY score DESC, title
LIMIT 20 OFFSET 8000;To speed it up I use the following index:
CREATE INDEX idx_test1 ON products
USING GIN (category_ids gin__int_ops) WHERE published;This one helps a lot unless there are too many products in one category. It quickly filters out products that belong to that category but then there is a sort operation that has to be done the hard way (without index).
A have installed
btree_gin extension allowing me to build multi-column GIN index like this:CREATE INDEX idx_test2 ON products USING GIN (
category_ids gin__int_ops, score, title) WHERE published;But Postgres does not want to use that for sorting. Even when I remove
DESC specifier in the query.Any alternative approaches to optimize the task are very welcome.
Additional information:
- PostgreSQL 9.4, with intarray extension
- total number of products currently is 260k but expected to grow significantly (upto 10M, this is multi-tenant e-commerce platform)
- products per category 1..10000 (can grow up to 100k), average is below 100 but those categories with large number of products tend to attract many more requests
The following query plan was obtained from smaller test system (4680 products in selected category, 200k products total in the table):
```
Limit (cost=948.99..948.99 rows=1 width=72) (actual time=82.330..82.341 rows=20 loops=1)
-> Sort (cost=948.37..948.99 rows=245 width=72) (actual time=80.231..81.337 rows=4020 loops=1)
Sort Key: score, title
Sort M
Solution
I've done a lot of experimenting and here are my findings.
GIN and sorting
GIN index currently (as of version 9.4) can not assist ordering.
Of the index types currently supported by PostgreSQL, only B-tree can produce sorted output — the other index types return matching rows in an unspecified, implementation-dependent order.
work_mem
Thanks Chris for pointing out to this configuration parameter.
It defaults to 4MB, and in case your recordset is larger, increasing
Restart the server for change to take effect, then double check:
Original query
I've populated my database with 650k products with some categories holding up to 40k products. I've simplified query a bit by removing
As we can see
Reduce memory footprint
Don't select full records for sorting, use ids, apply sort, offset and limit, then load just 10 records we need:
Note
JOIN and additional B-tree index
As Chris advised I've created additional index:
First I tried joining like this:
Query plan differs slightly but result is the same:
```
Sort (cost=2444.10..2444.11 rows=1 width=1390) (actual time=700.747..700.747 rows=10 loops=1)
Sort Key: products.score, products.title
Sort Method: quicksort Memory: 35kB
-> Nested Loop (cost=2436.05..2444.09 rows=1 width=1390) (actual time=700.651..700.690 rows=10 loops=1)
-> HashAggregate (cost=2435.63..2435.64 rows=1 width=4) (actual time=700.630..700.630 rows=10 loops=1)
Group Key: products_1.id
-> Limit (cost=2435.62..2435.62 rows=1 width=72) (actual time=700.619..700.619 rows=10 loops=1)
GIN and sorting
GIN index currently (as of version 9.4) can not assist ordering.
Of the index types currently supported by PostgreSQL, only B-tree can produce sorted output — the other index types return matching rows in an unspecified, implementation-dependent order.
work_mem
Thanks Chris for pointing out to this configuration parameter.
It defaults to 4MB, and in case your recordset is larger, increasing
work_mem to proper value (can be found from EXPLAIN ANALYSE) can significantly speed up sort operations.ALTER SYSTEM SET work_mem TO '32MB';Restart the server for change to take effect, then double check:
SHOW work_mem;Original query
I've populated my database with 650k products with some categories holding up to 40k products. I've simplified query a bit by removing
published clause:SELECT * FROM products WHERE category_ids @> ARRAY [248688]
ORDER BY score DESC, title LIMIT 10 OFFSET 30000;
Limit (cost=2435.62..2435.62 rows=1 width=1390) (actual time=1141.254..1141.256 rows=10 loops=1)
-> Sort (cost=2434.00..2435.62 rows=646 width=1390) (actual time=1115.706..1140.513 rows=30010 loops=1)
Sort Key: score, title
Sort Method: external merge Disk: 29656kB
-> Bitmap Heap Scan on products (cost=17.01..2403.85 rows=646 width=1390) (actual time=11.831..25.646 rows=41666 loops=1)
Recheck Cond: (category_ids @> '{248688}'::integer[])
Heap Blocks: exact=6471
-> Bitmap Index Scan on idx_products_category_ids_gin (cost=0.00..16.85 rows=646 width=0) (actual time=10.140..10.140 rows=41666 loops=1)
Index Cond: (category_ids @> '{248688}'::integer[])
Planning time: 0.288 ms
Execution time: 1146.322 msAs we can see
work_mem was not enough so we had Sort Method: external merge Disk: 29656kB (the number here is approximate, it needs slightly more than 32MB for in-memory quicksort).Reduce memory footprint
Don't select full records for sorting, use ids, apply sort, offset and limit, then load just 10 records we need:
SELECT * FROM products WHERE id in (
SELECT id FROM products WHERE category_ids @> ARRAY[248688]
ORDER BY score DESC, title LIMIT 10 OFFSET 30000
) ORDER BY score DESC, title;
Sort (cost=2444.10..2444.11 rows=1 width=1390) (actual time=707.861..707.862 rows=10 loops=1)
Sort Key: products.score, products.title
Sort Method: quicksort Memory: 35kB
-> Nested Loop (cost=2436.05..2444.09 rows=1 width=1390) (actual time=707.764..707.803 rows=10 loops=1)
-> HashAggregate (cost=2435.63..2435.64 rows=1 width=4) (actual time=707.744..707.746 rows=10 loops=1)
Group Key: products_1.id
-> Limit (cost=2435.62..2435.62 rows=1 width=72) (actual time=707.732..707.734 rows=10 loops=1)
-> Sort (cost=2434.00..2435.62 rows=646 width=72) (actual time=704.163..706.955 rows=30010 loops=1)
Sort Key: products_1.score, products_1.title
Sort Method: quicksort Memory: 7396kB
-> Bitmap Heap Scan on products products_1 (cost=17.01..2403.85 rows=646 width=72) (actual time=11.587..35.076 rows=41666 loops=1)
Recheck Cond: (category_ids @> '{248688}'::integer[])
Heap Blocks: exact=6471
-> Bitmap Index Scan on idx_products_category_ids_gin (cost=0.00..16.85 rows=646 width=0) (actual time=9.883..9.883 rows=41666 loops=1)
Index Cond: (category_ids @> '{248688}'::integer[])
-> Index Scan using products_pkey on products (cost=0.42..8.45 rows=1 width=1390) (actual time=0.004..0.004 rows=1 loops=10)
Index Cond: (id = products_1.id)
Planning time: 0.682 ms
Execution time: 707.973 msNote
Sort Method: quicksort Memory: 7396kB. Result is much better.JOIN and additional B-tree index
As Chris advised I've created additional index:
CREATE INDEX idx_test7 ON products (score DESC, title);First I tried joining like this:
SELECT * FROM products NATURAL JOIN
(SELECT id FROM products WHERE category_ids @> ARRAY[248688]
ORDER BY score DESC, title LIMIT 10 OFFSET 30000) c
ORDER BY score DESC, title;Query plan differs slightly but result is the same:
```
Sort (cost=2444.10..2444.11 rows=1 width=1390) (actual time=700.747..700.747 rows=10 loops=1)
Sort Key: products.score, products.title
Sort Method: quicksort Memory: 35kB
-> Nested Loop (cost=2436.05..2444.09 rows=1 width=1390) (actual time=700.651..700.690 rows=10 loops=1)
-> HashAggregate (cost=2435.63..2435.64 rows=1 width=4) (actual time=700.630..700.630 rows=10 loops=1)
Group Key: products_1.id
-> Limit (cost=2435.62..2435.62 rows=1 width=72) (actual time=700.619..700.619 rows=10 loops=1)
Code Snippets
ALTER SYSTEM SET work_mem TO '32MB';SHOW work_mem;SELECT * FROM products WHERE category_ids @> ARRAY [248688]
ORDER BY score DESC, title LIMIT 10 OFFSET 30000;
Limit (cost=2435.62..2435.62 rows=1 width=1390) (actual time=1141.254..1141.256 rows=10 loops=1)
-> Sort (cost=2434.00..2435.62 rows=646 width=1390) (actual time=1115.706..1140.513 rows=30010 loops=1)
Sort Key: score, title
Sort Method: external merge Disk: 29656kB
-> Bitmap Heap Scan on products (cost=17.01..2403.85 rows=646 width=1390) (actual time=11.831..25.646 rows=41666 loops=1)
Recheck Cond: (category_ids @> '{248688}'::integer[])
Heap Blocks: exact=6471
-> Bitmap Index Scan on idx_products_category_ids_gin (cost=0.00..16.85 rows=646 width=0) (actual time=10.140..10.140 rows=41666 loops=1)
Index Cond: (category_ids @> '{248688}'::integer[])
Planning time: 0.288 ms
Execution time: 1146.322 msSELECT * FROM products WHERE id in (
SELECT id FROM products WHERE category_ids @> ARRAY[248688]
ORDER BY score DESC, title LIMIT 10 OFFSET 30000
) ORDER BY score DESC, title;
Sort (cost=2444.10..2444.11 rows=1 width=1390) (actual time=707.861..707.862 rows=10 loops=1)
Sort Key: products.score, products.title
Sort Method: quicksort Memory: 35kB
-> Nested Loop (cost=2436.05..2444.09 rows=1 width=1390) (actual time=707.764..707.803 rows=10 loops=1)
-> HashAggregate (cost=2435.63..2435.64 rows=1 width=4) (actual time=707.744..707.746 rows=10 loops=1)
Group Key: products_1.id
-> Limit (cost=2435.62..2435.62 rows=1 width=72) (actual time=707.732..707.734 rows=10 loops=1)
-> Sort (cost=2434.00..2435.62 rows=646 width=72) (actual time=704.163..706.955 rows=30010 loops=1)
Sort Key: products_1.score, products_1.title
Sort Method: quicksort Memory: 7396kB
-> Bitmap Heap Scan on products products_1 (cost=17.01..2403.85 rows=646 width=72) (actual time=11.587..35.076 rows=41666 loops=1)
Recheck Cond: (category_ids @> '{248688}'::integer[])
Heap Blocks: exact=6471
-> Bitmap Index Scan on idx_products_category_ids_gin (cost=0.00..16.85 rows=646 width=0) (actual time=9.883..9.883 rows=41666 loops=1)
Index Cond: (category_ids @> '{248688}'::integer[])
-> Index Scan using products_pkey on products (cost=0.42..8.45 rows=1 width=1390) (actual time=0.004..0.004 rows=1 loops=10)
Index Cond: (id = products_1.id)
Planning time: 0.682 ms
Execution time: 707.973 msCREATE INDEX idx_test7 ON products (score DESC, title);Context
StackExchange Database Administrators Q#105316, answer score: 16
Revisions (0)
No revisions yet.