patternsqlMinor
PostgreSQL: reuse of complex intermediate result in same query
Viewed 0 times
postgresqlresultsamequeryreusecomplexintermediate
Problem
Using PostgreSQL (8.4), I'm creating a view that summarizes various results from a few tables (e.g creating columns
Is there a way to optimize this so as to avoid those same results to be computed every time?
Here is a simplified example that reproduces the problem.
Here is a view that performs the most time-consuming operations:
```
CREATE VIEW testview1 AS
SELECT
t1.id,
t1.log_timestamp,
(SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
(SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
(SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C')
a, b, c in the view), and then I need to combine some of these results together in the same query (e.g. a+b, a-b, (a+b)/c, ...), so as to produce the final results. What I'm noticing is that the intermediate results are fully computed each time they are used, even if it's done within the same query.Is there a way to optimize this so as to avoid those same results to be computed every time?
Here is a simplified example that reproduces the problem.
CREATE TABLE test1 (
id SERIAL PRIMARY KEY,
log_timestamp TIMESTAMP NOT NULL
);
CREATE TABLE test2 (
test1_id INTEGER NOT NULL REFERENCES test1(id),
category VARCHAR(10) NOT NULL,
col1 INTEGER,
col2 INTEGER
);
CREATE INDEX test_category_idx ON test2(category);
-- Added after edit to this question
CREATE INDEX test_id_idx ON test2(test1_id);
-- Populating with test data.
INSERT INTO test1(log_timestamp)
SELECT * FROM generate_series('2011-01-01'::timestamp, '2012-01-01'::timestamp, '1 hour');
INSERT INTO test2
SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
(20000*random()-10000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
(2000*random()-1000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
(2000*random()-40)::int, (3000*random()-200)::int FROM test1;Here is a view that performs the most time-consuming operations:
```
CREATE VIEW testview1 AS
SELECT
t1.id,
t1.log_timestamp,
(SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
(SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
(SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C')
Solution
(Apologies for answering my own question, but after reading this unrelated question and answer, it occurred to me I should try using a CTE instead. It works.)
Here is another view, similar to
(This is just an example, I'm not suggesting that combining a view and a CTE are a necessarily a good idea: a CTE might be enough.)
Unlike
However, it doesn't re-compute the results that are used multiple times within the same query (which was the objective).
Unlike
```
Subquery Scan testview2 (cost=395272.42..395600.96 rows=8761 width=24) (actual time=0.147..562.790 rows=8761 loops=1)
-> CTE Scan on testcte (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.144..554.194 rows=8761 loops=1)
CTE testcte
-> Seq Scan on test1 t1 (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.140..542.657 rows=8761 loops=1)
SubPlan 1
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.013 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'A'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 2
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Here is another view, similar to
testview1 in the question, but that uses a Common Table Expression:CREATE VIEW testview2 AS
WITH testcte AS (SELECT
t1.id,
t1.log_timestamp,
(SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
(SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
(SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
FROM test1 t1)
SELECT * FROM testcte;(This is just an example, I'm not suggesting that combining a view and a CTE are a necessarily a good idea: a CTE might be enough.)
Unlike
testview1, the query plan for SELECT a FROM testview2 now also computes b and c, which were ignored since unused in testview1:Subquery Scan testview2 (cost=395272.42..395535.25 rows=8761 width=8) (actual time=0.256..607.941 rows=8761 loops=1)
-> CTE Scan on testcte (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.255..604.106 rows=8761 loops=1)
CTE testcte
-> Seq Scan on test1 t1 (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.252..589.358 rows=8761 loops=1)
SubPlan 1
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'A'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.009..0.009 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 2
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'B'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 3
-> Aggregate (cost=15.02..15.04 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=8) (actual time=0.013..0.014 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'C'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)However, it doesn't re-compute the results that are used multiple times within the same query (which was the objective).
Unlike
testview1 with which SELECT a, a, a, a, a took 5 times longer than SELECT a, here SELECT a, a, a, a, a, b, c, a+b, a+c, b+c FROM testview2 takes just as long as SELECT a FROM testview2 or SELECT a, b, c FROM testview2. It only goes through a, b and c once:```
Subquery Scan testview2 (cost=395272.42..395600.96 rows=8761 width=24) (actual time=0.147..562.790 rows=8761 loops=1)
-> CTE Scan on testcte (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.144..554.194 rows=8761 loops=1)
CTE testcte
-> Seq Scan on test1 t1 (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.140..542.657 rows=8761 loops=1)
SubPlan 1
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.013 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'A'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 2
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Code Snippets
CREATE VIEW testview2 AS
WITH testcte AS (SELECT
t1.id,
t1.log_timestamp,
(SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
(SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
(SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
FROM test1 t1)
SELECT * FROM testcte;Subquery Scan testview2 (cost=395272.42..395535.25 rows=8761 width=8) (actual time=0.256..607.941 rows=8761 loops=1)
-> CTE Scan on testcte (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.255..604.106 rows=8761 loops=1)
CTE testcte
-> Seq Scan on test1 t1 (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.252..589.358 rows=8761 loops=1)
SubPlan 1
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'A'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.009..0.009 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 2
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'B'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 3
-> Aggregate (cost=15.02..15.04 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=8) (actual time=0.013..0.014 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'C'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)Subquery Scan testview2 (cost=395272.42..395600.96 rows=8761 width=24) (actual time=0.147..562.790 rows=8761 loops=1)
-> CTE Scan on testcte (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.144..554.194 rows=8761 loops=1)
CTE testcte
-> Seq Scan on test1 t1 (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.140..542.657 rows=8761 loops=1)
SubPlan 1
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.013 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'A'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 2
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'B'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.006..0.006 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 3
-> Aggregate (cost=15.02..15.04 rows=1 width=8) (actual time=0.018..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'C'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)Context
StackExchange Database Administrators Q#9043, answer score: 6
Revisions (0)
No revisions yet.