patternsqlMinor
Improving performance of nasty nested-view joins
Viewed 0 times
viewnestednastyperformanceimprovingjoins
Problem
I have a moderate-sized database spread out over a few tables, the rough architecture is:
Each table has ~200,000 rows.
I also have a view that basically glues all of these together so that I can
The view works, and the query plan's index utilization seems sane, but the results are less-than-fast:
```
> EXPLAIN ANALYZE SELECT(*) FROM overlay WHERE test_session=12345;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=7.19..74179.49 rows=10 width=305) (actual time=10680.129..10680.494 rows=4 loops=1)
Merge Cond: (p.data_id = d.id)
-> Merge Join (cost=7.19..75077.04 rows=183718 width=234) (actual time=0.192..10434.995 rows=173986 loops=1)
Merge Cond: (p.data_id = input_file.data_id)
-> Merge Join (cost=7.19..69917.74 rows=183718 width=222) (actual time=0.173..9255.653 rows=173986 loops=1)
Merge Cond: (p.data_id = stage1_output_file.data_id)
-> Merge Join (cost=5.50..62948.54 rows=183718 width=186) (actual time=0.153..8081.949 rows=173986 loops=1)
Merge Cond: (p.data_id = stage2_output_file.data_id)
-> Merge Join (cost=3.90..55217.36 rows=183723 width=150) (actual time=0.132..6918.814 rows=173986 loops=1)
Merge Cond: (p.data_id = stage3_output_file.data_id)
- Input Data (Data ID, Session ID and a few fields of statistical importance)
- Input File (Data ID and a blob)
- Stage 1 Output File (Data ID and a blob)
- Stage 2 output File (Data ID and a blob)
- Category 1 results (Data ID and some booleans)
- Category 2 results (Data ID and some integers)
- Category 3 results (Data ID and some integers)
Each table has ~200,000 rows.
I also have a view that basically glues all of these together so that I can
SELECT a bunch of IDs (usually selecting them based on the Session ID) and view all the relevant data on one page.The view works, and the query plan's index utilization seems sane, but the results are less-than-fast:
```
> EXPLAIN ANALYZE SELECT(*) FROM overlay WHERE test_session=12345;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=7.19..74179.49 rows=10 width=305) (actual time=10680.129..10680.494 rows=4 loops=1)
Merge Cond: (p.data_id = d.id)
-> Merge Join (cost=7.19..75077.04 rows=183718 width=234) (actual time=0.192..10434.995 rows=173986 loops=1)
Merge Cond: (p.data_id = input_file.data_id)
-> Merge Join (cost=7.19..69917.74 rows=183718 width=222) (actual time=0.173..9255.653 rows=173986 loops=1)
Merge Cond: (p.data_id = stage1_output_file.data_id)
-> Merge Join (cost=5.50..62948.54 rows=183718 width=186) (actual time=0.153..8081.949 rows=173986 loops=1)
Merge Cond: (p.data_id = stage2_output_file.data_id)
-> Merge Join (cost=3.90..55217.36 rows=183723 width=150) (actual time=0.132..6918.814 rows=173986 loops=1)
Merge Cond: (p.data_id = stage3_output_file.data_id)
Solution
I am speculating here, but I guess the fact that you
Try inlining the query from the view and make it a
I cleaned up your syntax to make it more manageable. Added an alias for the second
If that should lead to a considerably faster execution time, you can try and add missing rows due to the
LEFT JOIN to the view makes the planner compute the results from the view as a whole before joining to the first part of the query.Try inlining the query from the view and make it a
JOIN instead of a LEFT JOIN, just to see if the planner finds a faster way now:SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
, p.data_id AS p_data_id, p.x2, c.str AS impression, i.h
, p.x3, p.x3, p.x4
, s.x5, s.x6, s.x7, s.x8, s.x9, s.x10, s.x11, s.x12, s.x13, s.x14
, t.x15, t.x16, t.x17, t.x18, t.x19, t.x20, t.x21, t.x22, t.x23
, i.data AS input
, s1.data AS stage1, s2.data AS stage2, s3.data AS stage3
FROM input_data d
JOIN category1_results p ON p.data_id = d.data_id
JOIN input_file i USING (data_id)
JOIN stage1_output_file s1 USING (data_id)
JOIN stage2_output_file s2 USING (data_id)
JOIN stage3_output_file s3 USING (data_id)
JOIN category2_results s USING (data_id)
JOIN category3_results t USING (data_id)
JOIN category1_impression_str c ON p.impression = c.id
LEFT JOIN quality_codes t ON t.id = d.input_quality
WHERE NOT d.deleted;I cleaned up your syntax to make it more manageable. Added an alias for the second
data_id column, so it can execute.If that should lead to a considerably faster execution time, you can try and add missing rows due to the
INNER JOIN like this:SELECT DISTINCT ON (1,2,3,4,5,6,7,8) *
FROM (
>
) x
UNION ALL
SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
FROM input_data d
LEFT JOIN quality_codes t ON t.id = d.input_quality
WHERE NOT d.deleted;
ORDER BY 1,2,3,4,5,6,7,8, 9 NULLS LAST; -- p.data_id is otherwise not nullCode Snippets
SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
, p.data_id AS p_data_id, p.x2, c.str AS impression, i.h
, p.x3, p.x3, p.x4
, s.x5, s.x6, s.x7, s.x8, s.x9, s.x10, s.x11, s.x12, s.x13, s.x14
, t.x15, t.x16, t.x17, t.x18, t.x19, t.x20, t.x21, t.x22, t.x23
, i.data AS input
, s1.data AS stage1, s2.data AS stage2, s3.data AS stage3
FROM input_data d
JOIN category1_results p ON p.data_id = d.data_id
JOIN input_file i USING (data_id)
JOIN stage1_output_file s1 USING (data_id)
JOIN stage2_output_file s2 USING (data_id)
JOIN stage3_output_file s3 USING (data_id)
JOIN category2_results s USING (data_id)
JOIN category3_results t USING (data_id)
JOIN category1_impression_str c ON p.impression = c.id
LEFT JOIN quality_codes t ON t.id = d.input_quality
WHERE NOT d.deleted;SELECT DISTINCT ON (1,2,3,4,5,6,7,8) *
FROM (
<<query>>
) x
UNION ALL
SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
FROM input_data d
LEFT JOIN quality_codes t ON t.id = d.input_quality
WHERE NOT d.deleted;
ORDER BY 1,2,3,4,5,6,7,8, 9 NULLS LAST; -- p.data_id is otherwise not nullContext
StackExchange Database Administrators Q#17126, answer score: 5
Revisions (0)
No revisions yet.