patternsqlMinor
Query optimization or missing indexes?
Viewed 0 times
optimizationqueryindexesmissing
Problem
I have this query to aggregate some data from
The big questions is, what indexes am I likely missing here and can the query be optimized further?
EDIT Adding requested information
These are my current indexes:
explain analyze gives me the following result:
```
Update on t1 (cost=2725.40..2737.42 rows=1 width=138) (actual time=1.428..1.428 rows=0 loops=1)
-> Nested Loop (cost=2725.40..2737.42 rows=1 width=138) (actual time=0.646..1.148 rows=166 loops=1)
-> Subquery Scan on t2 (cost=2725.40..2725.42 rows=1 width=84) (actual time=0.642..0.729 rows=166 loops=1)
-> HashAggregate (cost=2725.40..2725.41 rows=1 width=17) (actual time=0.639..0.685 rows=166 loops=1)
-> Nested Loop (cost=5.81..2725.39 rows=1 width=17) (actual time=0.034..0.536 rows=197 loops=1)
-> Bitmap Heap Scan on t2 b (cost=5.81..414.29 rows=193 width=13) (actual time=0.024..0.050 rows=197 loops=1)
Recheck Cond: (user_id = 1)
-> Bitmap Index Scan on ix_t2_user_id (cost=0.00..5.76 rows=193 width=0) (actual time=0.017..0.017 ro
t2 into t1. This is done to optimize the application I am working on so that there are less queries to the database. I chose the below approach to make sure that I don't have to update t1 twice.The big questions is, what indexes am I likely missing here and can the query be optimized further?
update t1
set
col1 = t2.col1_count,
col2 = t2.col2_sum,
col3 = t2.col3_sum
from (
select
b.user_id, b.t1_id,
coalesce(count(b.id), 0) as col1_count,
sum(case when b.col5 = true then b.col2 else 0 end) as col2_sum,
sum(case when b.col5 = false then b.col3 else 0 end) as col3_sum
from t1 a
left join t2 b on b.t1_id = a.id
where
b.user_id = 1
group by b.user_id, b.t1_id
) as t2
where
t2.t1_id = t1.id;EDIT Adding requested information
These are my current indexes:
create index ix_t1_user_id on t1(user_id);
create unique index ux_t2_t1_id_t3_id on t2(t1_id, t3_id);
create index ix_t2_user_id on t2(user_id);
create index ix_t2_t1_id on t2(t1_id);explain analyze gives me the following result:
```
Update on t1 (cost=2725.40..2737.42 rows=1 width=138) (actual time=1.428..1.428 rows=0 loops=1)
-> Nested Loop (cost=2725.40..2737.42 rows=1 width=138) (actual time=0.646..1.148 rows=166 loops=1)
-> Subquery Scan on t2 (cost=2725.40..2725.42 rows=1 width=84) (actual time=0.642..0.729 rows=166 loops=1)
-> HashAggregate (cost=2725.40..2725.41 rows=1 width=17) (actual time=0.639..0.685 rows=166 loops=1)
-> Nested Loop (cost=5.81..2725.39 rows=1 width=17) (actual time=0.034..0.536 rows=197 loops=1)
-> Bitmap Heap Scan on t2 b (cost=5.81..414.29 rows=193 width=13) (actual time=0.024..0.050 rows=197 loops=1)
Recheck Cond: (user_id = 1)
-> Bitmap Index Scan on ix_t2_user_id (cost=0.00..5.76 rows=193 width=0) (actual time=0.017..0.017 ro
Solution
Simplify query
-
Remove
It should be noted that except for count, these functions return a
null value when no rows are selected.
Meaning,
To reset rows in
To do both at once, the version with
Avoid empty updates
If there is a chance that values in
For columns defined
This can make a big difference, updates are expensive.
Index
The only index you need for this (in addition to the primary key on
- Remove useless
user_idfrom subquery.
-
Remove
coalesce around count(). I quote the manual on aggregate functions:It should be noted that except for count, these functions return a
null value when no rows are selected.
Meaning,
count() never returns NULL.- Remove the redundant
LEFT JOINfrom the subquery (Update: not if you want to set columns to 0 where no rows are found int2).
UPDATE t1
SET col1 = t2.col1_count
,col2 = t2.col2_sum
,col3 = t2.col3_sum
FROM (
SELECT t1_id
,count(*) AS col1_count -- if id is NOT NULL, count(*) is a bit faster
,sum(CASE WHEN col5 = true THEN col2 ELSE 0 END) AS col2_sum -- might be simpler
,sum(CASE WHEN col5 = false THEN col3 ELSE 0 END) AS col3_sum -- missing info
FROM t2
WHERE user_id = 1
GROUP BY t1_id
) t2
WHERE t2.t1_id = t1.id;To reset rows in
t1 without any matches in t2:UPDATE t1
SET col1 = 0, col2 = 0, col3 = 0
WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t2.t1_id = t1.id);To do both at once, the version with
LEFT JOIN the subquery like you had is probably faster, depends on your data distribution.Avoid empty updates
If there is a chance that values in
t1 are already up to date, add conditions to the WHERE clause that prevent empty updates (applies to both queries):...
AND (col1 IS DISTINCT FROM t2.col1_count OR -- again: might be simpler
col2 IS DISTINCT FROM t2.col2_sum OR -- missing info
col3 IS DISTINCT FROM t2.col3_sum)For columns defined
NOT NULL you can use <> instead of IS DISTINCT FROM.This can make a big difference, updates are expensive.
Index
The only index you need for this (in addition to the primary key on
t1.id) is:CREATE INDEX ix_t2_user_id ON t2(user_id);Code Snippets
UPDATE t1
SET col1 = t2.col1_count
,col2 = t2.col2_sum
,col3 = t2.col3_sum
FROM (
SELECT t1_id
,count(*) AS col1_count -- if id is NOT NULL, count(*) is a bit faster
,sum(CASE WHEN col5 = true THEN col2 ELSE 0 END) AS col2_sum -- might be simpler
,sum(CASE WHEN col5 = false THEN col3 ELSE 0 END) AS col3_sum -- missing info
FROM t2
WHERE user_id = 1
GROUP BY t1_id
) t2
WHERE t2.t1_id = t1.id;UPDATE t1
SET col1 = 0, col2 = 0, col3 = 0
WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t2.t1_id = t1.id);...
AND (col1 IS DISTINCT FROM t2.col1_count OR -- again: might be simpler
col2 IS DISTINCT FROM t2.col2_sum OR -- missing info
col3 IS DISTINCT FROM t2.col3_sum)CREATE INDEX ix_t2_user_id ON t2(user_id);Context
StackExchange Database Administrators Q#33612, answer score: 5
Revisions (0)
No revisions yet.