principlesqlMinor
A costly approach to filtering on a calculated date
Viewed 0 times
dateapproachcostlycalculatedfiltering
Problem
This builds off my previously wonderfully answered question Running a complex query for every date in a range I've got an expanded problem and I'm certainly not solving it in a performant way.
The order table setup is the same. A table of repeating orders that can be overridden. There's another table of order items, each order item has a quantity, a product, lead time in days and relates to the order table.
The problem was to figure out which orders were "active" on a range of ship dates allowing "temporary" orders to override "standing orders" on each specific date. We solved that with a query like this.
```
INSERT INTO orders
(id, client_id, start_date, end_date, order_type) VALUES
(1, 1, '2014-02-05', NULL, 'standing'),
(2, 2, '2014-07-16', '2015-07-19', 'standing'),
(3, 3, '2015-04-01', NULL, 'standing'),
(4, 3, '2015-07-18', '2015-07-18', 'temporary'),
(5, 4, '2015-04-01', NULL, 'standing'),
(6, 4, '2015-07-18', '2015-07-18', 'temporary');
SELECT DISTINCT
ON (client_id) *
FROM
orders
WHERE
start_date = DATE '2015-07-18'
)
ORDER BY
client_id,
order_type DESC
| id | client_id | start_date | end_date | order_type |
|----|-----------|----------------------------|------------------------|------------|
| 1 | 1 | February, 05 2014 00:00:00 | (null) | standing |
| 2 | 2 | July, 16 2014 00:00:00 | July, 19 2015 00:00:00 | standing |
| 4 | 3 | July, 18 2015 00:00:00 | July, 18 2015 00:00:00 | temporary |
| 6 | 4 | July, 18 2015 00:00:00 | July, 18 2015 00:00:00 | temporary
The order table setup is the same. A table of repeating orders that can be overridden. There's another table of order items, each order item has a quantity, a product, lead time in days and relates to the order table.
CREATE TABLE orders (
id integer NOT NULL,
client_id integer NOT NULL,
start_date date NOT NULL,
end_date date,
order_type character varying NOT NULL
);
CREATE TABLE order_items (
id integer NOT NULL,
order_id integer NOT NULL,
product_id integer NOT NULL,
quantity integer NOT NULL,
lead_days integer NOT NULL
);The problem was to figure out which orders were "active" on a range of ship dates allowing "temporary" orders to override "standing orders" on each specific date. We solved that with a query like this.
```
INSERT INTO orders
(id, client_id, start_date, end_date, order_type) VALUES
(1, 1, '2014-02-05', NULL, 'standing'),
(2, 2, '2014-07-16', '2015-07-19', 'standing'),
(3, 3, '2015-04-01', NULL, 'standing'),
(4, 3, '2015-07-18', '2015-07-18', 'temporary'),
(5, 4, '2015-04-01', NULL, 'standing'),
(6, 4, '2015-07-18', '2015-07-18', 'temporary');
SELECT DISTINCT
ON (client_id) *
FROM
orders
WHERE
start_date = DATE '2015-07-18'
)
ORDER BY
client_id,
order_type DESC
| id | client_id | start_date | end_date | order_type |
|----|-----------|----------------------------|------------------------|------------|
| 1 | 1 | February, 05 2014 00:00:00 | (null) | standing |
| 2 | 2 | July, 16 2014 00:00:00 | July, 19 2015 00:00:00 | standing |
| 4 | 3 | July, 18 2015 00:00:00 | July, 18 2015 00:00:00 | temporary |
| 6 | 4 | July, 18 2015 00:00:00 | July, 18 2015 00:00:00 | temporary
Solution
WARNING! Depending on authors response, my answer could be way off. Awaiting his reponse.
reconbot, we need more info, because it seems like your entire subquery
is completely redundant. You appear to be re-checking for already checked conditions.
For example, try just running
and you get the exact same results. See the SQL Fiddle here. If you're still concerned about distinct
as in this SQL Fiddle.
So, I'm either misunderstanding, or your originally given data set doesn't reflect the significance of your sub-
IGNORE EVERYTHING BELOW HERE...
Until we get clarification from the author...
I think I've got your solution. At the very least, I've made a SQLFiddle with the results, and it appears that it will be much less costly.
I love chances when a
Take a look at the Postgres documentation on
The reason I looked into it for your case was: you were building the
OK, so I realize the explanation of my motivation isn't so hot, sorry! :P On to the results...
New Query
Without further ado, here it is:
Also, you can note that we get back identical results as those you are expecting, except for a single added column on the right side, as part of the
So, we don't have the big data set which you have, so that we can really test out the results of the new query versus the old. I'm relying on the
Using the Old SQLFiddle, we can see that the overall estimated cost is approximately 283,000 Postgres units. :P
Using the New SQLFiddle, we can see the much better estimated cost of only 204!!
reconbot, we need more info, because it seems like your entire subquery
SELECT DISTINCT ON (client_id) id
FROM orders
WHERE
start_date = ship_date)
AND client_id = items_to_work.client_id
ORDER BY client_id, order_type DESCis completely redundant. You appear to be re-checking for already checked conditions.
For example, try just running
SELECT *
FROM (
SELECT
order_items.id item_id,
order_items.product_id,
order_items.quantity,
order_items.lead_days,
orders.id order_id,
orders.client_id client_id,
(DATE '2014-07-18' + order_items.lead_days) ship_date
FROM order_items
INNER JOIN orders ON orders.id = order_items.order_id
WHERE
-- find all order items that have active orders on the production_date + lead time
DATE '2014-07-18' >= (orders.start_date - order_items.lead_days)
AND (
DATE '2014-07-18' <= (orders.end_date - order_items.lead_days)
OR orders.end_date IS NULL
)
) items_to_work;and you get the exact same results. See the SQL Fiddle here. If you're still concerned about distinct
client_ids, you could also useSELECT *
FROM (
SELECT DISTINCT ON (client_id)
order_items.id item_id,
order_items.product_id,
order_items.quantity,
order_items.lead_days,
orders.id order_id,
orders.client_id client_id,
(DATE '2014-07-18' + order_items.lead_days) ship_date
FROM order_items
INNER JOIN orders ON orders.id = order_items.order_id
WHERE
-- find all order items that have active orders on the production_date + lead time
DATE '2014-07-18' >= (orders.start_date - order_items.lead_days)
AND (
DATE '2014-07-18' <= (orders.end_date - order_items.lead_days)
OR orders.end_date IS NULL
)
) items_to_work;as in this SQL Fiddle.
So, I'm either misunderstanding, or your originally given data set doesn't reflect the significance of your sub-
SELECT in the original question.IGNORE EVERYTHING BELOW HERE...
Until we get clarification from the author...
I think I've got your solution. At the very least, I've made a SQLFiddle with the results, and it appears that it will be much less costly.
LATERAL sub-queries in PostgreSQLI love chances when a
LATERAL sub-query can be used to save some time in your query. Unfortunately, I think I'm pretty bad at explaining when and where it should be used, and I'm only OK at recognizing instances of when to use it. It just doesn't come up too often in my particular query designs.Take a look at the Postgres documentation on
LATERAL keyword for some ideas, and also I really like this SlideShare presentation by Markus Winand for helping to explain LATERAL a bit better. In essence, it has a flavor of a "for each" statement in typical pseudo-coding vernacular. The reason I looked into it for your case was: you were building the
items_to_work table, and then using the client_id attribute of items_to_work in your sub-SELECT, where you checked if items_to_work.order_id was IN the distinct returned set of orders.id values. Using items_to_work attributes in the WHERE clause of the sub-SELECT was the red flag for me.OK, so I realize the explanation of my motivation isn't so hot, sorry! :P On to the results...
New Query
Without further ado, here it is:
SELECT *
FROM (
SELECT
order_items.id item_id,
order_items.product_id,
order_items.quantity,
order_items.lead_days,
orders.id order_id,
orders.client_id client_id,
(DATE '2014-07-18' + order_items.lead_days) ship_date
FROM order_items
INNER JOIN orders ON orders.id = order_items.order_id
WHERE
-- find all order items that have active orders on the production_date + lead time
DATE '2014-07-18' >= (orders.start_date - order_items.lead_days)
AND (
DATE '2014-07-18' = ship_date)
AND items_to_work.order_id = orders.id) lat;Also, you can note that we get back identical results as those you are expecting, except for a single added column on the right side, as part of the
LATERAL selection, which is a replication of the orders.id. I'm not super experienced with LATERAL (it doesn't come up too often in my work), so if that's a problem for you, we can sort out a way to drop it. :PEXPLAIN resultsSo, we don't have the big data set which you have, so that we can really test out the results of the new query versus the old. I'm relying on the
EXPLAIN estimates here, but...Using the Old SQLFiddle, we can see that the overall estimated cost is approximately 283,000 Postgres units. :P
Using the New SQLFiddle, we can see the much better estimated cost of only 204!!
Code Snippets
SELECT DISTINCT ON (client_id) id
FROM orders
WHERE
start_date <= ship_date and (end_date is NULL OR end_date >= ship_date)
AND client_id = items_to_work.client_id
ORDER BY client_id, order_type DESCSELECT *
FROM (
SELECT
order_items.id item_id,
order_items.product_id,
order_items.quantity,
order_items.lead_days,
orders.id order_id,
orders.client_id client_id,
(DATE '2014-07-18' + order_items.lead_days) ship_date
FROM order_items
INNER JOIN orders ON orders.id = order_items.order_id
WHERE
-- find all order items that have active orders on the production_date + lead time
DATE '2014-07-18' >= (orders.start_date - order_items.lead_days)
AND (
DATE '2014-07-18' <= (orders.end_date - order_items.lead_days)
OR orders.end_date IS NULL
)
) items_to_work;SELECT *
FROM (
SELECT DISTINCT ON (client_id)
order_items.id item_id,
order_items.product_id,
order_items.quantity,
order_items.lead_days,
orders.id order_id,
orders.client_id client_id,
(DATE '2014-07-18' + order_items.lead_days) ship_date
FROM order_items
INNER JOIN orders ON orders.id = order_items.order_id
WHERE
-- find all order items that have active orders on the production_date + lead time
DATE '2014-07-18' >= (orders.start_date - order_items.lead_days)
AND (
DATE '2014-07-18' <= (orders.end_date - order_items.lead_days)
OR orders.end_date IS NULL
)
) items_to_work;SELECT *
FROM (
SELECT
order_items.id item_id,
order_items.product_id,
order_items.quantity,
order_items.lead_days,
orders.id order_id,
orders.client_id client_id,
(DATE '2014-07-18' + order_items.lead_days) ship_date
FROM order_items
INNER JOIN orders ON orders.id = order_items.order_id
WHERE
-- find all order items that have active orders on the production_date + lead time
DATE '2014-07-18' >= (orders.start_date - order_items.lead_days)
AND (
DATE '2014-07-18' <= (orders.end_date - order_items.lead_days)
OR orders.end_date IS NULL
)
) items_to_work,
LATERAL(
SELECT id FROM orders
WHERE orders.client_id = items_to_work.client_id
AND start_date <= ship_date and (end_date is NULL OR end_date >= ship_date)
AND items_to_work.order_id = orders.id) lat;Context
StackExchange Database Administrators Q#108767, answer score: 3
Revisions (0)
No revisions yet.