patternsqlMinor
Grouping numbers based on closest sum to a specific value with Postgres
Viewed 0 times
postgresgroupingwithnumbersclosestvaluebasedsumspecific
Problem
I have a table like this:
How can I group those values in a way that every single group has the closest possible value to 60000?
For example (not probably the best fit):
v | c
---+-------
Z | 217
J | 620
U | 1891
F | 3751
A | 5673
Y | 5859
O | 7347
K | 9827
I | 11842
R | 11997
H | 14818
M | 18321
G | 18445
E | 19220
D | 22444
W | 22692
T | 24428
P | 26257
N | 35247
L | 41416
C | 42594
B | 43586
S | 59613How can I group those values in a way that every single group has the closest possible value to 60000?
For example (not probably the best fit):
Z + J + U + F + A + Y + O + K + I + R -- 59024
S -- 59613
B + H -- 58404
etc.Solution
A bit of research reveals that this question is known as the subset sum problem in computer science.
On stackoverflow, Lukas Eder provides an Oracle solution to a similar question, and a longer analysis on jooq's blog.
Here's a postgres version derived from his work:
The result with the sample data is:
Note that the execution time will grow exponentially with the row count in the base table.
On stackoverflow, Lukas Eder provides an Oracle solution to a similar question, and a longer analysis on jooq's blog.
Here's a postgres version derived from his work:
CREATE TABLE tab(v text, c int);
-- populate table
INSERT INTO tab(v,c) VALUES('Z', 217);
etc...
-- show 10 better results
WITH RECURSIVE sums (subset_sum, max_v, a) AS (
SELECT
c,v,array[v] as a
FROM
tab
UNION ALL
SELECT
c + subset_sum,
v,
array_append(a, v)
FROM
sums
JOIN
tab
ON sums.max_v < tab.v
)
SELECT subset_sum, a FROM sums WHERE subset_sum <= 60000
ORDER BY 60000-subset_sum
LIMIT 10;The result with the sample data is:
subset_sum | a
------------+-------------
59985 | {A,K,N,O,U}
59985 | {I,K,R,T,U}
59985 | {A,G,J,N}
59985 | {A,F,G,P,Y}
59985 | {C,K,O,Z}
59985 | {A,C,F,J,O}
59985 | {F,H,L}
59985 | {D,E,M}
59985 | {A,C,K,U}
59985 | {E,J,K,M,R}
(10 rows)Note that the execution time will grow exponentially with the row count in the base table.
Code Snippets
CREATE TABLE tab(v text, c int);
-- populate table
INSERT INTO tab(v,c) VALUES('Z', 217);
etc...
-- show 10 better results
WITH RECURSIVE sums (subset_sum, max_v, a) AS (
SELECT
c,v,array[v] as a
FROM
tab
UNION ALL
SELECT
c + subset_sum,
v,
array_append(a, v)
FROM
sums
JOIN
tab
ON sums.max_v < tab.v
)
SELECT subset_sum, a FROM sums WHERE subset_sum <= 60000
ORDER BY 60000-subset_sum
LIMIT 10;subset_sum | a
------------+-------------
59985 | {A,K,N,O,U}
59985 | {I,K,R,T,U}
59985 | {A,G,J,N}
59985 | {A,F,G,P,Y}
59985 | {C,K,O,Z}
59985 | {A,C,F,J,O}
59985 | {F,H,L}
59985 | {D,E,M}
59985 | {A,C,K,U}
59985 | {E,J,K,M,R}
(10 rows)Context
StackExchange Database Administrators Q#201674, answer score: 4
Revisions (0)
No revisions yet.