HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Grouping numbers based on closest sum to a specific value with Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresgroupingwithnumbersclosestvaluebasedsumspecific

Problem

I have a table like this:

v |   c   
---+-------
 Z |   217
 J |   620
 U |  1891
 F |  3751
 A |  5673
 Y |  5859
 O |  7347
 K |  9827
 I | 11842
 R | 11997
 H | 14818
 M | 18321
 G | 18445
 E | 19220
 D | 22444
 W | 22692
 T | 24428
 P | 26257
 N | 35247
 L | 41416
 C | 42594
 B | 43586
 S | 59613


How can I group those values in a way that every single group has the closest possible value to 60000?

For example (not probably the best fit):

Z + J + U + F + A + Y + O + K + I + R -- 59024
S -- 59613
B + H -- 58404
etc.

Solution

A bit of research reveals that this question is known as the subset sum problem in computer science.

On stackoverflow, Lukas Eder provides an Oracle solution to a similar question, and a longer analysis on jooq's blog.

Here's a postgres version derived from his work:

CREATE TABLE tab(v text, c int);

-- populate table
INSERT INTO tab(v,c) VALUES('Z', 217);
etc...

-- show 10 better results

WITH RECURSIVE sums (subset_sum, max_v, a) AS (
        SELECT
            c,v,array[v] as a
        FROM
            tab

        UNION ALL

        SELECT
            c + subset_sum,
            v,
            array_append(a, v)
        FROM
            sums
        JOIN
            tab
        ON sums.max_v < tab.v
    )
SELECT subset_sum, a FROM sums WHERE subset_sum <= 60000
ORDER BY 60000-subset_sum
LIMIT 10;


The result with the sample data is:

subset_sum |      a      
------------+-------------
      59985 | {A,K,N,O,U}
      59985 | {I,K,R,T,U}
      59985 | {A,G,J,N}
      59985 | {A,F,G,P,Y}
      59985 | {C,K,O,Z}
      59985 | {A,C,F,J,O}
      59985 | {F,H,L}
      59985 | {D,E,M}
      59985 | {A,C,K,U}
      59985 | {E,J,K,M,R}
(10 rows)


Note that the execution time will grow exponentially with the row count in the base table.

Code Snippets

CREATE TABLE tab(v text, c int);

-- populate table
INSERT INTO tab(v,c) VALUES('Z', 217);
etc...

-- show 10 better results

WITH RECURSIVE sums (subset_sum, max_v, a) AS (
        SELECT
            c,v,array[v] as a
        FROM
            tab

        UNION ALL

        SELECT
            c + subset_sum,
            v,
            array_append(a, v)
        FROM
            sums
        JOIN
            tab
        ON sums.max_v < tab.v
    )
SELECT subset_sum, a FROM sums WHERE subset_sum <= 60000
ORDER BY 60000-subset_sum
LIMIT 10;
subset_sum |      a      
------------+-------------
      59985 | {A,K,N,O,U}
      59985 | {I,K,R,T,U}
      59985 | {A,G,J,N}
      59985 | {A,F,G,P,Y}
      59985 | {C,K,O,Z}
      59985 | {A,C,F,J,O}
      59985 | {F,H,L}
      59985 | {D,E,M}
      59985 | {A,C,K,U}
      59985 | {E,J,K,M,R}
(10 rows)

Context

StackExchange Database Administrators Q#201674, answer score: 4

Revisions (0)

No revisions yet.