HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Joining timestamped records with the most-recent-prior records from another timestamped table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
therecentwithtablerecordstimestampedanotherpriorfrommost

Problem

Say I have 2 tables:

Table: sales
  timestamp   timestamp without time zone
  amount      numeric

Table: prices
  timestamp   timestamp without time zone
  price       numeric


How can I get a result set that tells me, for each record in sales, the last price from before that sale took place?

Once I have that, I can multiply amount by the most-recent-prior price to get the total sale price.

(I did not design these tables, so I don't have the option of storing the price on each sale.)

I'm using PostgreSQL 9.3.

Edit: I can write something that works using nested subqueries that require O(n^2) space and/or time, but the sales table has millions of rows, so I'm hoping there's a better way to do it. Maybe some clever use of window functions or the Postgres-specific lead or lag?

Solution

Guess I don't need to tell you that this is a seriously awkward data model. Anyway, I think this query would do what you're looking for:

SELECT subq2.sales_time, subq2.num_sold, subq2.effective_price,
   subq2.effective_price * subq2.num_sold AS total_sale_price
FROM (

  SELECT subq1.sales_time, subq1.num_sold, subq1.max_price_ts,
    (SELECT price FROM prices
     WHERE timestamp = subq1.max_price_ts) AS effective_price
  FROM
    (SELECT sales.timestamp AS sales_time, sales.amount AS num_sold,
      (SELECT MAX(prices.timestamp) FROM prices
       WHERE prices.timestamp <=  sales.timestamp) AS max_price_ts
    FROM sales
    ) AS subq1

  ) AS subq2

ORDER BY subq2.sales_time DESC;


There's probably a more concise way to write the above, perhaps using DISTINCT ON and ORDER BY to save having to fetch the price for the MAX time in a separate subquery, but I'll leave that as an exercise for the reader.

EDIT Alright, here's a simplified version which I think should work out to be equivalent but much faster.

SELECT DISTINCT ON (sales.timestamp)
       sales.timestamp AS sales_time,
       sales.amount AS num_sold,
       prices.price AS effective_price,
       prices.price * sales.amount AS total_sale_price

      FROM sales
INNER JOIN prices
        ON prices.timestamp <= sales.timestamp
  ORDER BY sales.timestamp DESC, prices.price DESC


If that doesn't work for you, it would be helpful for you to post a lot more information such as a minimal testcase and EXPLAIN ANALYZE showing how slow the query is for you.

Code Snippets

SELECT subq2.sales_time, subq2.num_sold, subq2.effective_price,
   subq2.effective_price * subq2.num_sold AS total_sale_price
FROM (

  SELECT subq1.sales_time, subq1.num_sold, subq1.max_price_ts,
    (SELECT price FROM prices
     WHERE timestamp = subq1.max_price_ts) AS effective_price
  FROM
    (SELECT sales.timestamp AS sales_time, sales.amount AS num_sold,
      (SELECT MAX(prices.timestamp) FROM prices
       WHERE prices.timestamp <=  sales.timestamp) AS max_price_ts
    FROM sales
    ) AS subq1

  ) AS subq2

ORDER BY subq2.sales_time DESC;
SELECT DISTINCT ON (sales.timestamp)
       sales.timestamp AS sales_time,
       sales.amount AS num_sold,
       prices.price AS effective_price,
       prices.price * sales.amount AS total_sale_price

      FROM sales
INNER JOIN prices
        ON prices.timestamp <= sales.timestamp
  ORDER BY sales.timestamp DESC, prices.price DESC

Context

StackExchange Database Administrators Q#74391, answer score: 3

Revisions (0)

No revisions yet.