HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

ts_rank massively slows my query, how can I improve performance?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
canqueryimprovemassivelyslowsts_rankperformancehow

Problem

Here's the table definition

Table "public.kb_article_contents"
   Column   |   Type   | Modifiers 
------------+----------+-----------
article_id | smallint | not null
contents   | text     | not null
keywords   | text     | not null
Indexes:
   "contents_idx" gin (to_tsvector('english'::regconfig, contents))
Foreign-key constraints:
   "kb_article_contents_article_id_fkey" FOREIGN KEY (article_id) REFERENCES kb_articles(id)


and the query

support=> EXPLAIN ANALYSE
SELECT
    id,
-- if we remove the next line runtimes speeds up (and of course the order by)
    ts_rank(to_tsvector( 'english', contents ), plainto_tsquery('string')) AS rank
FROM
    kb_article_contents
    INNER JOIN kb_articles
            ON ( kb_article_contents.article_id = kb_articles.id )

WHERE
    published = 'true'
    AND
    to_tsvector( 'english', contents ) @@ plainto_tsquery('string')
ORDER BY rank DESC
LIMIT 25
;


Here's the query plan for the slow query

```
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=57.90..57.92 rows=5 width=830) (actual time=452.003..452.028 rows=25 loops=1)
-> Sort (cost=57.90..57.92 rows=5 width=830) (actual time=452.001..452.010 rows=25 loops=1)
Sort Key: (ts_rank(to_tsvector('english'::regconfig, kb_article_contents.contents), plainto_tsquery('string'::text)))
Sort Method: top-N heapsort Memory: 17kB
-> Hash Join (cost=21.36..57.85 rows=5 width=830) (actual time=17.688..451.334 rows=299 loops=1)
Hash Cond: (kb_articles.id = kb_article_contents.article_id)
-> Seq Scan on kb_articles (cost=0.00..32.06 rows=1156 width=4) (actual time=0.008..1.059 rows=1156 loops=1)
Filter: published
-> Hash (cost=21.30..21.30 rows=5 width=828) (actual time=1.175..1.175 row

Solution

ts_rank will rank all rows, most of which will probably have a ranking of 0. So it's faster to do a search without ts_rank and then ts_rank the results of that search using the same query. So you'd probably want to do something like

SELECT score.id, ts_rank(score.contents::tsvector, plainto_tsquery('string')) as rank from
(SELECT
    id, contents
FROM
    kb_article_contents
    INNER JOIN kb_articles
            ON ( kb_article_contents.article_id = kb_articles.id )

WHERE
    published = 'true'
    AND
    contents::tsvector @@ plainto_tsquery('string')) 
    as score
order by rank desc;

Code Snippets

SELECT score.id, ts_rank(score.contents::tsvector, plainto_tsquery('string')) as rank from
(SELECT
    id, contents
FROM
    kb_article_contents
    INNER JOIN kb_articles
            ON ( kb_article_contents.article_id = kb_articles.id )

WHERE
    published = 'true'
    AND
    contents::tsvector @@ plainto_tsquery('string')) 
    as score
order by rank desc;

Context

StackExchange Database Administrators Q#4054, answer score: 3

Revisions (0)

No revisions yet.