HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Slow left join lateral in subquery

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
leftjoinslowlateralsubquery

Problem

I have three tables: units (apartment units), leases, and line_items (a better name would have been recurring rents). A unit has many leases and lease has many line items (associated with each rent change). My goal is, given a set of a months, to list each unit with its most recent rent to it.

```
Table public.units
Column | Type | Collation | Nullable | Default
----------------+-----------------------------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('units_id_seq'::regclass)
number | character varying | | |
bedrooms | integer | | |
bathrooms | integer | | |
square_footage | integer | | |
community_id | integer | | |
building_id | integer | | |
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
slug | character varying | | |
status | integer | | not null | 0
note | text | | |
half_bathrooms | integer | | |
display_number | character varying | | |
Indexes:
units_pkey PRIMARY KEY, btree (id)
index_units_on_building_id btree (building_id)
index_units_on_community_id btree (community_id)
index_units_on_slug btree (slug)
Foreign-key constraints:
fk_rails_5850136a38 FOREIGN KEY (building_id) REFERENCES buildings(id)
fk_rails_b860cf198b FOREIGN KEY (community_id) REFERENCES communities(id)
Referenced by:
TABLE leases CONSTRAINT fk_rails_29210439a5 FOREIGN KEY (unit

Solution

You have a lot of redundant work in your query. The first CTE last_rent retrieves the "first" row per month for every unit in the whole table and materializes the result, while only a fraction of that is reused in the later CTE unit_rent_month. Plus, you repeat the work there.

You can basically remove the first CTE last_rent completely. And while being at it, remove all CTEs. All you actually need is a single LATERAL subquery. This radically rewritten query should do the same, much faster (untested):

SELECT u.id AS unit_id, u.community_id, u.bedrooms, u.bathrooms
     , last_rent.amount_cents
     , date_trunc('month', mon.dt)::date AS period
FROM   units u
CROSS  JOIN generate_series(timestamp '2019-02-01'  -- switched to lower bound
                          , timestamp '2019-02-01'  -- and see below
                          , interval '1 month') mon(dt)
LEFT   JOIN LATERAL (
   SELECT li.amount_cents
   FROM   leases     l
   JOIN   line_items li ON l.id = li.lease_id
   WHERE  l.unit_id = u.id
   AND    l.community_id = u.community_id  -- redundant?
   AND    li.name  = 'RNT'
   AND    li.start = mon.dt  -- also required, see below
   ORDER  BY li.start DESC    -- NULLS LAST ?
   LIMIT  1
   ) last_rent ON true
WHERE  u.community_id IN (X)  -- ?
ORDER  BY mon.dt, u.id;


Your example produces only a single month. The intention obviously is to optionally produce several consecutive months at once. I changed the generate_series() expression subtly - here is why:

  • Generating time series between two dates in PostgreSQL



I also switched to providing lower bounds like you did in your updated query. That's slightly more convenient.

The way you repeated the condition community_id IN (X) could result in different community_id matching between units and leases if the IN clause has more than one expression. I tightened that to match exactly with AND l.community_id = u.community_id. Adapt if that's not as desired. Seems like the column leases.community_id is redundant to begin with, then you can drop this predicate completely.

This predicate makes a logical difference:

AND    li.start >= mon.dt


You dropped it in your updated query, but maybe you should keep it to always get the latest amount_cents for the month of the result row - NULL if there is none. It's what your original query did.

If you drop it, you get the latest amount_cents in the given date range, possibly from a previous month. I.e. you can get a different result for a month if you change the lower bound of the date range.

Related:

  • PostgreSQL: Generate a series of dates for each group in a table



  • Optimize GROUP BY query to retrieve latest record per user



Indexes

Depending on undisclosed details, you might get faster results, yet, with a different query style, or by adding one or more multicolumn or partial indexes. Shots in the dark, this might get you very fast index-only scans:

CREATE INDEX foo ON leases (unit_id, id);

CREATE INDEX bar ON line_items (lease_id, start DESC, amount_cents)
WHERE name = 'RNT';


Or, slightly better yet, in Postgres 11:

CREATE INDEX bar ON line_items (lease_id, start DESC) INCLUDING (amount_cents)
WHERE name = 'RNT';


WHERE name = 'RNT' typically only pays if 'RNT' is not too common.

Related (consider the preconditions mentioned there):

  • Can Postgres use an index-only scan for this query with joined tables?

Code Snippets

SELECT u.id AS unit_id, u.community_id, u.bedrooms, u.bathrooms
     , last_rent.amount_cents
     , date_trunc('month', mon.dt)::date AS period
FROM   units u
CROSS  JOIN generate_series(timestamp '2019-02-01'  -- switched to lower bound
                          , timestamp '2019-02-01'  -- and see below
                          , interval '1 month') mon(dt)
LEFT   JOIN LATERAL (
   SELECT li.amount_cents
   FROM   leases     l
   JOIN   line_items li ON l.id = li.lease_id
   WHERE  l.unit_id = u.id
   AND    l.community_id = u.community_id  -- redundant?
   AND    li.name  = 'RNT'
   AND    li.start <  mon.dt + interval '1 month'
   AND    li.start >= mon.dt  -- also required, see below
   ORDER  BY li.start DESC    -- NULLS LAST ?
   LIMIT  1
   ) last_rent ON true
WHERE  u.community_id IN (X)  -- ?
ORDER  BY mon.dt, u.id;
AND    li.start >= mon.dt
CREATE INDEX foo ON leases (unit_id, id);

CREATE INDEX bar ON line_items (lease_id, start DESC, amount_cents)
WHERE name = 'RNT';
CREATE INDEX bar ON line_items (lease_id, start DESC) INCLUDING (amount_cents)
WHERE name = 'RNT';

Context

StackExchange Database Administrators Q#230516, answer score: 3

Revisions (0)

No revisions yet.