snippetMinor
How to join by the most recent preceding date?
Viewed 0 times
theprecedingrecentjoindatehowmost
Problem
Imagine I have a database with historic price data and a second table with dates
A new row was entered into the price history table every time the price changed.
How can I (efficiently) join the two tables so that I can get the price each customer had to pay on that day?
I need the syntax for teradata, thanks!
A new row was entered into the price history table every time the price changed.
How can I (efficiently) join the two tables so that I can get the price each customer had to pay on that day?
I need the syntax for teradata, thanks!
Solution
I'm afraid that I cannot guarantee 100% Teradata compatibility for my answer as there are, to my knowledge, no Teradata fiddles available. However, I have used PostgreSQL, one of the more standards compliant systems and there are no PostgreSQL specific constructs used in my answer (see the fiddle
I glanced at the Teradata documentation for Window functions (
First thing I did was
Another small note, it's not a good idea to name your fields using
and
I then ran the following queries - I've left in extraneous fields which aren't required for the final calculation so that my thought processes could be followed - using
If you don't have a version with
Result:
Notice the use of the defaults
Then, I ran:
Result (again, cull fields as you see fit):
Which is, AFAICT, the correct answer. You may want to test with edge cases which I may have overlooked - if there's a problem, get back to me. +1 for an interesting and challenging question and welcome to the forum!
here), so I think we're good to go! I glanced at the Teradata documentation for Window functions (
LAG and LEAD) and for Common Table Expressions (CTEs) and comparing it with PostgreSQL's syntax (Window functions, CTEs), it all seems fairly vanilla.First thing I did was
CREATE and populate the tables. As an aside, you should avoid using images on StackExchange for the reasons outlined in this link. Use text for DDL and DML or, better yet, provide a fiddle - if you can't find a Teradata one (I looked and couldn't), use something like PostgreSQL which should get you most of the way there.Another small note, it's not a good idea to name your fields using
SQL keywords. Liberal use of the _ (underscore) character is a big help here.CREATE TABLE price
(
item_id TEXT NOT NULL,
item_price SMALLINT NOT NULL,
change_date DATE NOT NULL
);
INSERT INTO price
VALUES
('A', 249, '2014-09-01'),
('B', 129, '2014-09-01'),
('A', 299, '2014-08-25'),
('B', 139, '2014-09-25');and
CREATE TABLE purchase
(
customer_id SMALLINT NOT NULL,
item_id TEXT NOT NULL,
purchase_date DATE NOT NULL
);
INSERT INTO purchase
VALUES
(1, 'A', '2014-08-27'), (2, 'A', '2014-09-02');I then ran the following queries - I've left in extraneous fields which aren't required for the final calculation so that my thought processes could be followed - using
LAG and LEAD can be tricky and it's easy to mix up which field you should be comparing with which other one, so my policy is to leave them all in and let the OP (Original Poster - i.e. you) decide which you find useful and want to keep and which you wish to eliminate.WITH pr_cte AS
(
SELECT
p.item_id,
LAG(p.item_price, 1)
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lag_price,
p.item_price,
LEAD(p.item_price, 1)
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lead_price,
LAG(change_date, 1, '2000-01-01')
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lag_date,
p.change_date,
LEAD(p.change_date, 1, '2038-01-01')
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lead_date
FROM price p
-- ORDER BY p.item_id, p.change_date -- not allowed in Teradata according to @dnoeth
)
SELECT * FROM pr_cte;If you don't have a version with
CTEs, you can always use a subquery. Result:
item_id lag_price item_price lead_price lag_date change_date lead_date
A 299 249 2000-01-01 2014-08-25 2014-09-01
A 299 249 2014-08-25 2014-09-01 2038-01-01
B 129 139 2000-01-01 2014-09-01 2014-09-25
B 129 139 2014-09-01 2014-09-25 2038-01-01Notice the use of the defaults
2000-01-01 and 2038-01-01 - I'm implicitly assuming that the prices before the specified dates in the price table go back to 2000 and that prices after will continue till 2038 when *nix time will explode! Obviously, you can change these for values more suited to your use case. I've left it up to you to provide default prices, again as per your use case.Then, I ran:
WITH pr_cte AS
(
SELECT
p.item_id,
LAG(p.item_price, 1)
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lag_price,
p.item_price,
LEAD(p.item_price, 1)
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lead_price,
LAG(change_date, 1, '2000-01-01')
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lag_date,
p.change_date,
LEAD(p.change_date, 1, '2038-01-01')
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lead_date
FROM price p
-- ORDER BY p.item_id, p.change_date -- see above and comment by @dnoeth
)
SELECT
pr.item_id, pr.item_price,
pur.customer_id, pur.item_id, pur.purchase_date
FROM pr_cte pr
JOIN purchase pur
ON pr.item_id = pur.item_id
AND pur.purchase_date >= pr.change_date
AND pur.purchase_date < pr.lead_date;Result (again, cull fields as you see fit):
item_id item_price customer_id item_id purchase_date
A 299 1 A 2014-08-27
A 249 2 A 2014-09-02Which is, AFAICT, the correct answer. You may want to test with edge cases which I may have overlooked - if there's a problem, get back to me. +1 for an interesting and challenging question and welcome to the forum!
Code Snippets
CREATE TABLE price
(
item_id TEXT NOT NULL,
item_price SMALLINT NOT NULL,
change_date DATE NOT NULL
);
INSERT INTO price
VALUES
('A', 249, '2014-09-01'),
('B', 129, '2014-09-01'),
('A', 299, '2014-08-25'),
('B', 139, '2014-09-25');CREATE TABLE purchase
(
customer_id SMALLINT NOT NULL,
item_id TEXT NOT NULL,
purchase_date DATE NOT NULL
);
INSERT INTO purchase
VALUES
(1, 'A', '2014-08-27'), (2, 'A', '2014-09-02');WITH pr_cte AS
(
SELECT
p.item_id,
LAG(p.item_price, 1)
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lag_price,
p.item_price,
LEAD(p.item_price, 1)
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lead_price,
LAG(change_date, 1, '2000-01-01')
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lag_date,
p.change_date,
LEAD(p.change_date, 1, '2038-01-01')
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lead_date
FROM price p
-- ORDER BY p.item_id, p.change_date -- not allowed in Teradata according to @dnoeth
)
SELECT * FROM pr_cte;item_id lag_price item_price lead_price lag_date change_date lead_date
A 299 249 2000-01-01 2014-08-25 2014-09-01
A 299 249 2014-08-25 2014-09-01 2038-01-01
B 129 139 2000-01-01 2014-09-01 2014-09-25
B 129 139 2014-09-01 2014-09-25 2038-01-01WITH pr_cte AS
(
SELECT
p.item_id,
LAG(p.item_price, 1)
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lag_price,
p.item_price,
LEAD(p.item_price, 1)
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lead_price,
LAG(change_date, 1, '2000-01-01')
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lag_date,
p.change_date,
LEAD(p.change_date, 1, '2038-01-01')
OVER (PARTITION BY p.item_id ORDER BY p.change_date) AS lead_date
FROM price p
-- ORDER BY p.item_id, p.change_date -- see above and comment by @dnoeth
)
SELECT
pr.item_id, pr.item_price,
pur.customer_id, pur.item_id, pur.purchase_date
FROM pr_cte pr
JOIN purchase pur
ON pr.item_id = pur.item_id
AND pur.purchase_date >= pr.change_date
AND pur.purchase_date < pr.lead_date;Context
StackExchange Database Administrators Q#256821, answer score: 3
Revisions (0)
No revisions yet.