patternsqlModerate
SUM over distinct rows with multiple joins
Viewed 0 times
distinctrowswithmultiplesumoverjoins
Problem
Schema:
Data:
So, we have:
Now I want to get answers for the following questions:
With the following query (SQLFiddle):
```
SELECT
country AS "group_by",
COUNT(DISTINCT items.id) AS "item_count",
SUM(items.price) AS "cost",
SUM(payments.amount) AS "earned",
SUM(extras.amount) AS "extra_earned"
FROM items
LEFT OUTER JOIN payme
CREATE TABLE "items" (
"id" SERIAL NOT NULL PRIMARY KEY,
"country" VARCHAR(2) NOT NULL,
"created" TIMESTAMP WITH TIME ZONE NOT NULL,
"price" NUMERIC(11, 2) NOT NULL
);
CREATE TABLE "payments" (
"id" SERIAL NOT NULL PRIMARY KEY,
"created" TIMESTAMP WITH TIME ZONE NOT NULL,
"amount" NUMERIC(11, 2) NOT NULL,
"item_id" INTEGER NULL
);
CREATE TABLE "extras" (
"id" SERIAL NOT NULL PRIMARY KEY,
"created" TIMESTAMP WITH TIME ZONE NOT NULL,
"amount" NUMERIC(11, 2) NOT NULL,
"item_id" INTEGER NULL
);Data:
INSERT INTO items VALUES
(1, 'CZ', '2016-11-01', 100),
(2, 'CZ', '2016-11-02', 100),
(3, 'PL', '2016-11-03', 20),
(4, 'CZ', '2016-11-04', 150)
;
INSERT INTO payments VALUES
(1, '2016-11-01', 60, 1),
(2, '2016-11-01', 60, 1),
(3, '2016-11-02', 100, 2),
(4, '2016-11-03', 25, 3),
(5, '2016-11-04', 150, 4)
;
INSERT INTO extras VALUES
(1, '2016-11-01', 5, 1),
(2, '2016-11-02', 1, 2),
(3, '2016-11-03', 2, 3),
(4, '2016-11-03', 3, 3),
(5, '2016-11-04', 5, 4)
;So, we have:
- 3 items in CZ in 1 in PL
- 370 earned in CZ and 25 in PL
- 350 cost in CZ and 20 in PL
- 11 extra earned in CZ and 5 extra earned in PL
Now I want to get answers for the following questions:
- How many items we had last month in every country?
- What was the total earned amount (sum of payments.amounts) in every country?
- What was the total cost (sum of items.price) in every country?
- What was the total extra earnings (sum of extras.amount) in every country?
With the following query (SQLFiddle):
```
SELECT
country AS "group_by",
COUNT(DISTINCT items.id) AS "item_count",
SUM(items.price) AS "cost",
SUM(payments.amount) AS "earned",
SUM(extras.amount) AS "extra_earned"
FROM items
LEFT OUTER JOIN payme
Solution
Since there can be multiple
Consider the "fishmarket" example:
To be precise,
Oh, and since we do not multiply rows in
db<>fiddle here
Old sqlfiddle
But if I want to filter by
Addressing your comment.
It depends. Can we apply the same filter to
If yes, then just add the filters in the subqueries as well. (Does not seem likely in this case.)
If no, but we are still selecting most items, the above query would still be most efficient. Some of the aggregations in the subqueries are eliminated in the joins, but that's still cheaper than more complex queries.
If no, and we are selecting a small fraction of items, I suggest correlated subqueries or
payments and multiple extras per item, you run into a "proxy cross join" between those two tables. Aggregate rows per item_id before joining to item and it should all be correct:SELECT i.country AS group_by
, COUNT(*) AS item_count
, SUM(i.price) AS cost
, SUM(p.sum_amount) AS earned
, SUM(e.sum_amount) AS extra_earned
FROM items i
LEFT JOIN (
SELECT item_id, SUM(amount) AS sum_amount
FROM payments
GROUP BY 1
) p ON p.item_id = i.id
LEFT JOIN (
SELECT item_id, SUM(amount) AS sum_amount
FROM extras
GROUP BY 1
) e ON e.item_id = i.id
GROUP BY 1;Consider the "fishmarket" example:
- Two SQL LEFT JOINS produce incorrect result
To be precise,
SUM(i.price) would be incorrect after joining to a single n-table, which multiplies each price by the number of related rows. Doing it twice just makes it worse - and also potentially computationally expensive.Oh, and since we do not multiply rows in
items now, we can just use the cheaper count(*) instead of count(DISTINCT i.id). (id being NOT NULL PRIMARY KEY.)db<>fiddle here
Old sqlfiddle
But if I want to filter by
items.created?Addressing your comment.
It depends. Can we apply the same filter to
payments.created and extras.created?If yes, then just add the filters in the subqueries as well. (Does not seem likely in this case.)
If no, but we are still selecting most items, the above query would still be most efficient. Some of the aggregations in the subqueries are eliminated in the joins, but that's still cheaper than more complex queries.
If no, and we are selecting a small fraction of items, I suggest correlated subqueries or
LATERAL joins. Examples:- Multiple array_agg() calls in a single query
Code Snippets
SELECT i.country AS group_by
, COUNT(*) AS item_count
, SUM(i.price) AS cost
, SUM(p.sum_amount) AS earned
, SUM(e.sum_amount) AS extra_earned
FROM items i
LEFT JOIN (
SELECT item_id, SUM(amount) AS sum_amount
FROM payments
GROUP BY 1
) p ON p.item_id = i.id
LEFT JOIN (
SELECT item_id, SUM(amount) AS sum_amount
FROM extras
GROUP BY 1
) e ON e.item_id = i.id
GROUP BY 1;Context
StackExchange Database Administrators Q#154586, answer score: 16
Revisions (0)
No revisions yet.