patternsqlMinor
Slow left join lateral in subquery
Viewed 0 times
leftjoinslowlateralsubquery
Problem
I have three tables:
```
Table public.units
Column | Type | Collation | Nullable | Default
----------------+-----------------------------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('units_id_seq'::regclass)
number | character varying | | |
bedrooms | integer | | |
bathrooms | integer | | |
square_footage | integer | | |
community_id | integer | | |
building_id | integer | | |
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
slug | character varying | | |
status | integer | | not null | 0
note | text | | |
half_bathrooms | integer | | |
display_number | character varying | | |
Indexes:
units_pkey PRIMARY KEY, btree (id)
index_units_on_building_id btree (building_id)
index_units_on_community_id btree (community_id)
index_units_on_slug btree (slug)
Foreign-key constraints:
fk_rails_5850136a38 FOREIGN KEY (building_id) REFERENCES buildings(id)
fk_rails_b860cf198b FOREIGN KEY (community_id) REFERENCES communities(id)
Referenced by:
TABLE leases CONSTRAINT fk_rails_29210439a5 FOREIGN KEY (unit
units (apartment units), leases, and line_items (a better name would have been recurring rents). A unit has many leases and lease has many line items (associated with each rent change). My goal is, given a set of a months, to list each unit with its most recent rent to it.```
Table public.units
Column | Type | Collation | Nullable | Default
----------------+-----------------------------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('units_id_seq'::regclass)
number | character varying | | |
bedrooms | integer | | |
bathrooms | integer | | |
square_footage | integer | | |
community_id | integer | | |
building_id | integer | | |
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
slug | character varying | | |
status | integer | | not null | 0
note | text | | |
half_bathrooms | integer | | |
display_number | character varying | | |
Indexes:
units_pkey PRIMARY KEY, btree (id)
index_units_on_building_id btree (building_id)
index_units_on_community_id btree (community_id)
index_units_on_slug btree (slug)
Foreign-key constraints:
fk_rails_5850136a38 FOREIGN KEY (building_id) REFERENCES buildings(id)
fk_rails_b860cf198b FOREIGN KEY (community_id) REFERENCES communities(id)
Referenced by:
TABLE leases CONSTRAINT fk_rails_29210439a5 FOREIGN KEY (unit
Solution
You have a lot of redundant work in your query. The first CTE
You can basically remove the first CTE
Your example produces only a single month. The intention obviously is to optionally produce several consecutive months at once. I changed the
I also switched to providing lower bounds like you did in your updated query. That's slightly more convenient.
The way you repeated the condition
This predicate makes a logical difference:
You dropped it in your updated query, but maybe you should keep it to always get the latest
If you drop it, you get the latest
Related:
Indexes
Depending on undisclosed details, you might get faster results, yet, with a different query style, or by adding one or more multicolumn or partial indexes. Shots in the dark, this might get you very fast index-only scans:
Or, slightly better yet, in Postgres 11:
Related (consider the preconditions mentioned there):
last_rent retrieves the "first" row per month for every unit in the whole table and materializes the result, while only a fraction of that is reused in the later CTE unit_rent_month. Plus, you repeat the work there.You can basically remove the first CTE
last_rent completely. And while being at it, remove all CTEs. All you actually need is a single LATERAL subquery. This radically rewritten query should do the same, much faster (untested):SELECT u.id AS unit_id, u.community_id, u.bedrooms, u.bathrooms
, last_rent.amount_cents
, date_trunc('month', mon.dt)::date AS period
FROM units u
CROSS JOIN generate_series(timestamp '2019-02-01' -- switched to lower bound
, timestamp '2019-02-01' -- and see below
, interval '1 month') mon(dt)
LEFT JOIN LATERAL (
SELECT li.amount_cents
FROM leases l
JOIN line_items li ON l.id = li.lease_id
WHERE l.unit_id = u.id
AND l.community_id = u.community_id -- redundant?
AND li.name = 'RNT'
AND li.start = mon.dt -- also required, see below
ORDER BY li.start DESC -- NULLS LAST ?
LIMIT 1
) last_rent ON true
WHERE u.community_id IN (X) -- ?
ORDER BY mon.dt, u.id;Your example produces only a single month. The intention obviously is to optionally produce several consecutive months at once. I changed the
generate_series() expression subtly - here is why:- Generating time series between two dates in PostgreSQL
I also switched to providing lower bounds like you did in your updated query. That's slightly more convenient.
The way you repeated the condition
community_id IN (X) could result in different community_id matching between units and leases if the IN clause has more than one expression. I tightened that to match exactly with AND l.community_id = u.community_id. Adapt if that's not as desired. Seems like the column leases.community_id is redundant to begin with, then you can drop this predicate completely.This predicate makes a logical difference:
AND li.start >= mon.dtYou dropped it in your updated query, but maybe you should keep it to always get the latest
amount_cents for the month of the result row - NULL if there is none. It's what your original query did.If you drop it, you get the latest
amount_cents in the given date range, possibly from a previous month. I.e. you can get a different result for a month if you change the lower bound of the date range.Related:
- PostgreSQL: Generate a series of dates for each group in a table
- Optimize GROUP BY query to retrieve latest record per user
Indexes
Depending on undisclosed details, you might get faster results, yet, with a different query style, or by adding one or more multicolumn or partial indexes. Shots in the dark, this might get you very fast index-only scans:
CREATE INDEX foo ON leases (unit_id, id);
CREATE INDEX bar ON line_items (lease_id, start DESC, amount_cents)
WHERE name = 'RNT';Or, slightly better yet, in Postgres 11:
CREATE INDEX bar ON line_items (lease_id, start DESC) INCLUDING (amount_cents)
WHERE name = 'RNT';WHERE name = 'RNT' typically only pays if 'RNT' is not too common.Related (consider the preconditions mentioned there):
- Can Postgres use an index-only scan for this query with joined tables?
Code Snippets
SELECT u.id AS unit_id, u.community_id, u.bedrooms, u.bathrooms
, last_rent.amount_cents
, date_trunc('month', mon.dt)::date AS period
FROM units u
CROSS JOIN generate_series(timestamp '2019-02-01' -- switched to lower bound
, timestamp '2019-02-01' -- and see below
, interval '1 month') mon(dt)
LEFT JOIN LATERAL (
SELECT li.amount_cents
FROM leases l
JOIN line_items li ON l.id = li.lease_id
WHERE l.unit_id = u.id
AND l.community_id = u.community_id -- redundant?
AND li.name = 'RNT'
AND li.start < mon.dt + interval '1 month'
AND li.start >= mon.dt -- also required, see below
ORDER BY li.start DESC -- NULLS LAST ?
LIMIT 1
) last_rent ON true
WHERE u.community_id IN (X) -- ?
ORDER BY mon.dt, u.id;AND li.start >= mon.dtCREATE INDEX foo ON leases (unit_id, id);
CREATE INDEX bar ON line_items (lease_id, start DESC, amount_cents)
WHERE name = 'RNT';CREATE INDEX bar ON line_items (lease_id, start DESC) INCLUDING (amount_cents)
WHERE name = 'RNT';Context
StackExchange Database Administrators Q#230516, answer score: 3
Revisions (0)
No revisions yet.