snippetsqlMinor
Complicated article sort
Viewed 0 times
articlesortcomplicated
Problem
In short I have a bunch of articles (~100k) which shall be ranked ("mixed", hence "newsmix") based on their newsValue (how important they are, between 1 and 10) and newsLifetime (how long they stay important, between 1 and 4), how long ago since published, and the users' preference for each category. It also groups them according to which story they belong to.
My main concern is that the query way too slow (~200ms), and thus I'm looking for ways to optimize the following query.
```
select array_to_json(array_agg(stories.articles)) articles from (
select array_to_json((array_agg(articles order by articles.newsmixValue desc))[1:3]) articles from (
select * from (
select a.*,
(10 a.breaking::integer) + ((a.data->'newsValue')::text::float / 10 (n.weight + u.weight) - ((floor(extract('epoch' from $1::timestamptz - a.published::timestamptz) / 60) / 1080) ((4 - (a.data->'newsLifetime')::text::float - 2.0) 0.3 + 2.0))) newsmixValue
from (
select
articles.*,
'Just nu' in (select value->>'title' from json_array_elements((data->'topics')::json)) as breaking,
s.id as "storyId",
s.title as "storyTitle"
from articles
left join stories s
on articles.data->>'storyId' = s.id
where published is not null
and is_deleted=false
and published::timestamptz $1::timestamptz::date - $7::integer
-- setting 0 means no date-limit
or $7 = 0
)
and category_id = any ($6)
order by published desc
) a
left join (
select usermix.key category_id, usermix.value::text::float weight from json_each($2) usermix
) u
on a.category_id=u.category_id
left join (
select newsmix.key category_id, newsmix.value::text::float weight from json_each($3) as newsmix
) n
on a.category_id=n.category_id
left join (
select threshold.key category_id, threshold.value::inte
My main concern is that the query way too slow (~200ms), and thus I'm looking for ways to optimize the following query.
```
select array_to_json(array_agg(stories.articles)) articles from (
select array_to_json((array_agg(articles order by articles.newsmixValue desc))[1:3]) articles from (
select * from (
select a.*,
(10 a.breaking::integer) + ((a.data->'newsValue')::text::float / 10 (n.weight + u.weight) - ((floor(extract('epoch' from $1::timestamptz - a.published::timestamptz) / 60) / 1080) ((4 - (a.data->'newsLifetime')::text::float - 2.0) 0.3 + 2.0))) newsmixValue
from (
select
articles.*,
'Just nu' in (select value->>'title' from json_array_elements((data->'topics')::json)) as breaking,
s.id as "storyId",
s.title as "storyTitle"
from articles
left join stories s
on articles.data->>'storyId' = s.id
where published is not null
and is_deleted=false
and published::timestamptz $1::timestamptz::date - $7::integer
-- setting 0 means no date-limit
or $7 = 0
)
and category_id = any ($6)
order by published desc
) a
left join (
select usermix.key category_id, usermix.value::text::float weight from json_each($2) usermix
) u
on a.category_id=u.category_id
left join (
select newsmix.key category_id, newsmix.value::text::float weight from json_each($3) as newsmix
) n
on a.category_id=n.category_id
left join (
select threshold.key category_id, threshold.value::inte
Solution
Indexes:
"temp_articles_pkey1" PRIMARY KEY, btree (id)
"articles_id_idx" UNIQUE, btree (id)
"articles_category_id_idx" btree (category_id)
"articles_expr_idx3" btree ((data ->> 'storyId'::text))
"articles_is_deleted_idx" btree (is_deleted)
"articles_published_idx" btree (published)This isn't how indexes normally work. While PostgreSQL apparently supports combining indexes, its method to do so is generally slower than accessing a single index. You don't put one index on each column that you are using. Instead, you build your indexes as subsets of the queried columns. Perhaps something like
"articles_expr_idx4" btree (category_id, published, (data ->> 'storyId'::text), is_deleted)Correct order may be different, although your known values are
category_id and a range for published. So category_id would normally be first in the index followed by the range. Note that
is_deleted may be better not in the index, since it only has two possible values. You usually want index values to exclude more possibilities. I would remove articles_is_deleted_idx entirely for that reason. You also don't need articles_id_idx as you already have a primary key index on that column. Also, I'm not a PostgreSQL guy, so apologies if I mangled the syntax.
left join (
select threshold.key category_id, threshold.value::integer weight from json_each_text($8) as threshold
) t
on a.category_id=t.category_id
where (a.data->>'newsValue')::float >= t.weightThe
WHERE clause only works if t.weight is not null. But if t.weight is not null, then you don't a LEFT JOIN. You could just use a standard INNER JOIN instead. As a general rule, inner joins are faster than outer joins (left or right). So if you are experiencing slow results, try replacing the outer joins with inner joins to see if you get the data that you need. Perhaps you need to make a requirement that mix values always exist for every category ID passed in
$6. That would allow you to replace the slow left outer joins with fast inner joins.Code Snippets
Indexes:
"temp_articles_pkey1" PRIMARY KEY, btree (id)
"articles_id_idx" UNIQUE, btree (id)
"articles_category_id_idx" btree (category_id)
"articles_expr_idx3" btree ((data ->> 'storyId'::text))
"articles_is_deleted_idx" btree (is_deleted)
"articles_published_idx" btree (published)"articles_expr_idx4" btree (category_id, published, (data ->> 'storyId'::text), is_deleted)left join (
select threshold.key category_id, threshold.value::integer weight from json_each_text($8) as threshold
) t
on a.category_id=t.category_id
where (a.data->>'newsValue')::float >= t.weightContext
StackExchange Code Review Q#80088, answer score: 2
Revisions (0)
No revisions yet.