HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why is this query with WHERE, ORDER BY and LIMIT so slow?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thiswhyorderwithlimitwherequeryslowand

Problem

Given this table posts_lists:

Table "public.posts_lists"
Column | Type | Collation | Nullable | Default
------------+------------------------+-----------+----------+---------
id | character varying(20) | | not null |
user_id | character varying(20) | | |
tags | jsonb | | |
score | integer | | |
created_at | integer | | |
Indexes:
"tmp_posts_lists_pkey1" PRIMARY KEY, btree (id)
"tmp_posts_lists_idx_create_at1532588309" btree (created_at)
"tmp_posts_lists_idx_score_desc1532588309" btree (score_rank(score, id::text) DESC)
"tmp_posts_lists_idx_tags1532588309" gin (jsonb_array_lower(tags))
"tmp_posts_lists_idx_user_id1532588309" btree (user_id)

Getting a list by tag is fast:

EXPLAIN ANALYSE
SELECT * FROM posts_lists
WHERE jsonb_array_lower(tags) ? lower('Qui');


Bitmap Heap Scan on posts_lists (cost=1397.50..33991.24 rows=10000 width=56) (actual time=0.110..0.132 rows=2 loops=1)
Recheck Cond: (jsonb_array_lower(tags) ? 'qui'::text)
Heap Blocks: exact=2
-> Bitmap Index Scan on tmp_posts_lists_idx_tags1532588309 (cost=0.00..1395.00 rows=10000 width=0) (actual time=0.010..0.010 rows=2 loops=1)
Index Cond: (jsonb_array_lower(tags) ? 'qui'::text)
Planning time: 0.297 ms
Execution time: 0.157 ms

Getting a list ordered by score, limit 100 - also fast:

EXPLAIN ANALYSE
SELECT *
FROM posts_lists
ORDER BY score_rank(score, id) DESC
LIMIT 100;


Limit (cost=0.56..12.03 rows=100 width=88) (actual time=0.074..0.559 rows=100 loops=1)
-> Index Scan using tmp_posts_lists_idx_score_desc1532588309 on posts_lists (cost=0.56..1146999.15 rows=10000473 width=88) (actual time=0.072..0.535 rows=100 loops=1)
Planning time: 0.586 ms
Execution time: 0.714 ms

But combining the above two queries is very slow:

```
EXPLAIN ANALYSE
SEL

Solution

The problem with this statement is, that the query planner has no usable statistics for jsonb_array_lower(tags).

As seen in the first explain:

(cost=0.00..1395.00 rows=10000 width=0) (actual time=0.010..0.010 rows=2 loops=1)


The planner expects 10.000 rows to be returned if he uses the filter jsonb_array_lower(tags) ? lower('Qui') but there are just two rows returned.

The same happens in the last statement. Because of this missing information, the planner assumes, that a scan on the index tmp_posts_lists_idx_score_desc1532588309 would be more efficient.

You could try to avoid the need of lower and normalize the input during INSERT and UPDATE.

Another method would be:

WITH c AS (
    SELECT id FROM posts_lists
    WHERE jsonb_array_lower(tags) ? lower('Qui')
)
SELECT * FROM posts_lists l, c
WHERE l.id = c.id
ORDER BY score_rank(score, id) DESC
LIMIT 100;


This statement uses a CTE as optimization fence, but be careful, the performance may get worse for other WHERE conditions or table content.

Code Snippets

(cost=0.00..1395.00 rows=10000 width=0) (actual time=0.010..0.010 rows=2 loops=1)
WITH c AS (
    SELECT id FROM posts_lists
    WHERE jsonb_array_lower(tags) ? lower('Qui')
)
SELECT * FROM posts_lists l, c
WHERE l.id = c.id
ORDER BY score_rank(score, id) DESC
LIMIT 100;

Context

StackExchange Database Administrators Q#213262, answer score: 3

Revisions (0)

No revisions yet.