HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

How to speed up ORDER BY sorting when using GIN index in PostgreSQL?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlsortingorderhowginusingwhenindexspeed

Problem

I have a table like this:

CREATE TABLE products (
  id serial PRIMARY KEY, 
  category_ids integer[],
  published boolean NOT NULL,
  score integer NOT NULL,
  title varchar NOT NULL);


A product can belong to multiple categories. category_ids column holds a list of ids of all product's categories.

Typical query looks like this (always searching for single category):

SELECT * FROM products WHERE published
  AND category_ids @> ARRAY[23465]
ORDER BY score DESC, title
LIMIT 20 OFFSET 8000;


To speed it up I use the following index:

CREATE INDEX idx_test1 ON products
  USING GIN (category_ids gin__int_ops) WHERE published;


This one helps a lot unless there are too many products in one category. It quickly filters out products that belong to that category but then there is a sort operation that has to be done the hard way (without index).

A have installed btree_gin extension allowing me to build multi-column GIN index like this:

CREATE INDEX idx_test2 ON products USING GIN (
  category_ids gin__int_ops, score, title) WHERE published;


But Postgres does not want to use that for sorting. Even when I remove DESC specifier in the query.

Any alternative approaches to optimize the task are very welcome.

Additional information:

  • PostgreSQL 9.4, with intarray extension



  • total number of products currently is 260k but expected to grow significantly (upto 10M, this is multi-tenant e-commerce platform)



  • products per category 1..10000 (can grow up to 100k), average is below 100 but those categories with large number of products tend to attract many more requests



The following query plan was obtained from smaller test system (4680 products in selected category, 200k products total in the table):

```
Limit (cost=948.99..948.99 rows=1 width=72) (actual time=82.330..82.341 rows=20 loops=1)
-> Sort (cost=948.37..948.99 rows=245 width=72) (actual time=80.231..81.337 rows=4020 loops=1)
Sort Key: score, title
Sort M

Solution

I've done a lot of experimenting and here are my findings.

GIN and sorting

GIN index currently (as of version 9.4) can not assist ordering.


Of the index types currently supported by PostgreSQL, only B-tree can produce sorted output — the other index types return matching rows in an unspecified, implementation-dependent order.

work_mem

Thanks Chris for pointing out to this configuration parameter.
It defaults to 4MB, and in case your recordset is larger, increasing work_mem to proper value (can be found from EXPLAIN ANALYSE) can significantly speed up sort operations.

ALTER SYSTEM SET work_mem TO '32MB';


Restart the server for change to take effect, then double check:

SHOW work_mem;


Original query

I've populated my database with 650k products with some categories holding up to 40k products. I've simplified query a bit by removing published clause:

SELECT * FROM products WHERE category_ids @> ARRAY [248688]
ORDER BY score DESC, title LIMIT 10 OFFSET 30000;

Limit  (cost=2435.62..2435.62 rows=1 width=1390) (actual time=1141.254..1141.256 rows=10 loops=1)
  ->  Sort  (cost=2434.00..2435.62 rows=646 width=1390) (actual time=1115.706..1140.513 rows=30010 loops=1)
        Sort Key: score, title
        Sort Method: external merge  Disk: 29656kB
        ->  Bitmap Heap Scan on products  (cost=17.01..2403.85 rows=646 width=1390) (actual time=11.831..25.646 rows=41666 loops=1)
              Recheck Cond: (category_ids @> '{248688}'::integer[])
              Heap Blocks: exact=6471
              ->  Bitmap Index Scan on idx_products_category_ids_gin  (cost=0.00..16.85 rows=646 width=0) (actual time=10.140..10.140 rows=41666 loops=1)
                    Index Cond: (category_ids @> '{248688}'::integer[])
Planning time: 0.288 ms
Execution time: 1146.322 ms


As we can see work_mem was not enough so we had Sort Method: external merge Disk: 29656kB (the number here is approximate, it needs slightly more than 32MB for in-memory quicksort).

Reduce memory footprint

Don't select full records for sorting, use ids, apply sort, offset and limit, then load just 10 records we need:

SELECT * FROM products WHERE id in (
  SELECT id FROM products WHERE category_ids @> ARRAY[248688]
  ORDER BY score DESC, title LIMIT 10 OFFSET 30000
) ORDER BY score DESC, title;

Sort  (cost=2444.10..2444.11 rows=1 width=1390) (actual time=707.861..707.862 rows=10 loops=1)
  Sort Key: products.score, products.title
  Sort Method: quicksort  Memory: 35kB
  ->  Nested Loop  (cost=2436.05..2444.09 rows=1 width=1390) (actual time=707.764..707.803 rows=10 loops=1)
        ->  HashAggregate  (cost=2435.63..2435.64 rows=1 width=4) (actual time=707.744..707.746 rows=10 loops=1)
              Group Key: products_1.id
              ->  Limit  (cost=2435.62..2435.62 rows=1 width=72) (actual time=707.732..707.734 rows=10 loops=1)
                    ->  Sort  (cost=2434.00..2435.62 rows=646 width=72) (actual time=704.163..706.955 rows=30010 loops=1)
                          Sort Key: products_1.score, products_1.title
                          Sort Method: quicksort  Memory: 7396kB
                          ->  Bitmap Heap Scan on products products_1  (cost=17.01..2403.85 rows=646 width=72) (actual time=11.587..35.076 rows=41666 loops=1)
                                Recheck Cond: (category_ids @> '{248688}'::integer[])
                                Heap Blocks: exact=6471
                                ->  Bitmap Index Scan on idx_products_category_ids_gin  (cost=0.00..16.85 rows=646 width=0) (actual time=9.883..9.883 rows=41666 loops=1)
                                      Index Cond: (category_ids @> '{248688}'::integer[])
        ->  Index Scan using products_pkey on products  (cost=0.42..8.45 rows=1 width=1390) (actual time=0.004..0.004 rows=1 loops=10)
              Index Cond: (id = products_1.id)
Planning time: 0.682 ms
Execution time: 707.973 ms


Note Sort Method: quicksort Memory: 7396kB. Result is much better.

JOIN and additional B-tree index

As Chris advised I've created additional index:

CREATE INDEX idx_test7 ON products (score DESC, title);


First I tried joining like this:

SELECT * FROM products NATURAL JOIN
  (SELECT id FROM products WHERE category_ids @> ARRAY[248688]
  ORDER BY score DESC, title LIMIT 10 OFFSET 30000) c
ORDER BY score DESC, title;


Query plan differs slightly but result is the same:

```
Sort (cost=2444.10..2444.11 rows=1 width=1390) (actual time=700.747..700.747 rows=10 loops=1)
Sort Key: products.score, products.title
Sort Method: quicksort Memory: 35kB
-> Nested Loop (cost=2436.05..2444.09 rows=1 width=1390) (actual time=700.651..700.690 rows=10 loops=1)
-> HashAggregate (cost=2435.63..2435.64 rows=1 width=4) (actual time=700.630..700.630 rows=10 loops=1)
Group Key: products_1.id
-> Limit (cost=2435.62..2435.62 rows=1 width=72) (actual time=700.619..700.619 rows=10 loops=1)

Code Snippets

ALTER SYSTEM SET work_mem TO '32MB';
SHOW work_mem;
SELECT * FROM products WHERE category_ids @> ARRAY [248688]
ORDER BY score DESC, title LIMIT 10 OFFSET 30000;

Limit  (cost=2435.62..2435.62 rows=1 width=1390) (actual time=1141.254..1141.256 rows=10 loops=1)
  ->  Sort  (cost=2434.00..2435.62 rows=646 width=1390) (actual time=1115.706..1140.513 rows=30010 loops=1)
        Sort Key: score, title
        Sort Method: external merge  Disk: 29656kB
        ->  Bitmap Heap Scan on products  (cost=17.01..2403.85 rows=646 width=1390) (actual time=11.831..25.646 rows=41666 loops=1)
              Recheck Cond: (category_ids @> '{248688}'::integer[])
              Heap Blocks: exact=6471
              ->  Bitmap Index Scan on idx_products_category_ids_gin  (cost=0.00..16.85 rows=646 width=0) (actual time=10.140..10.140 rows=41666 loops=1)
                    Index Cond: (category_ids @> '{248688}'::integer[])
Planning time: 0.288 ms
Execution time: 1146.322 ms
SELECT * FROM products WHERE id in (
  SELECT id FROM products WHERE category_ids @> ARRAY[248688]
  ORDER BY score DESC, title LIMIT 10 OFFSET 30000
) ORDER BY score DESC, title;

Sort  (cost=2444.10..2444.11 rows=1 width=1390) (actual time=707.861..707.862 rows=10 loops=1)
  Sort Key: products.score, products.title
  Sort Method: quicksort  Memory: 35kB
  ->  Nested Loop  (cost=2436.05..2444.09 rows=1 width=1390) (actual time=707.764..707.803 rows=10 loops=1)
        ->  HashAggregate  (cost=2435.63..2435.64 rows=1 width=4) (actual time=707.744..707.746 rows=10 loops=1)
              Group Key: products_1.id
              ->  Limit  (cost=2435.62..2435.62 rows=1 width=72) (actual time=707.732..707.734 rows=10 loops=1)
                    ->  Sort  (cost=2434.00..2435.62 rows=646 width=72) (actual time=704.163..706.955 rows=30010 loops=1)
                          Sort Key: products_1.score, products_1.title
                          Sort Method: quicksort  Memory: 7396kB
                          ->  Bitmap Heap Scan on products products_1  (cost=17.01..2403.85 rows=646 width=72) (actual time=11.587..35.076 rows=41666 loops=1)
                                Recheck Cond: (category_ids @> '{248688}'::integer[])
                                Heap Blocks: exact=6471
                                ->  Bitmap Index Scan on idx_products_category_ids_gin  (cost=0.00..16.85 rows=646 width=0) (actual time=9.883..9.883 rows=41666 loops=1)
                                      Index Cond: (category_ids @> '{248688}'::integer[])
        ->  Index Scan using products_pkey on products  (cost=0.42..8.45 rows=1 width=1390) (actual time=0.004..0.004 rows=1 loops=10)
              Index Cond: (id = products_1.id)
Planning time: 0.682 ms
Execution time: 707.973 ms
CREATE INDEX idx_test7 ON products (score DESC, title);

Context

StackExchange Database Administrators Q#105316, answer score: 16

Revisions (0)

No revisions yet.