HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Algorithm to pick elements in one array that sum to a setpoint, and corresponding elements in other array to average to a setpoint?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
elementsarrayotheraveragealgorithmonethatsumandpick

Problem

Note: The input file is an excel file. However, I am only looking for help with the algorithm as I can then code it in VBA.

I need to scan both columns (shown below) to find any number of column 1 values that when summed match the setpoint (E2) and column 2 values that when averaged match the setpoint (E3). One constraint is that the chosen values in column 1 and column 2 must be from the same row (or same 'index', shown in the image highlighted in blue and green).

Explanation of the solution below:
What it should do is say 'A2 and A6 sum to 7 (match cell E2), and the corresponding B2 and B6 average to 3' (match cell E3). Therefore, these rows should be selected/highlighted in any appropriate way. (note that technically the first solution would be rows 2 and 3 as they fit the sum/average setpoints also).

Example that does not work as a solution:
It can't be A4 and A5 because although they add to 7, B4 and B5 average to 1.5, not the setpoint of 3.

Image of the problem (with a solution highlighted):

A solution for one of the cases (below) works only for the case where there are two rows that will sum/average to the setpoints. It will not work for cases where more than two rows are required. The solution should work for any number of rows required.

Example where three rows would be required (Solution is the first three rows):

Reason for this solution: B6:B8 sums to the setpoint of 8 and C6:C8 average to the setpoint of 3

I am looking for help with an algorithm. If the solution is a large number of rows required (>100), how can I iterate through the combinations efficienctly to find a set of rows that both sums/averages to the setpoints?

It may not be possible to find a solution to both setpoints, in this case finding a solution within an error is also ok

Note: If there is a more suitable stack exchange please advise.

Finding an algorithm is NP-Hard, therefore I am also looking for appropriate simplifications to avoid this. An error term may be an example

Solution

In general, no; this problem is NP-hard. Even matching just the sum (without getting into anything about the average) is NP-hard, as it is the subset sum problem. Thus, you should not expect any efficient algorithm for your problem that works for all possible inputs.

I notice that in your examples, the numbers are all small. For that special case, there are reasonably efficient algorithms using dynamic programming: see, e.g., https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution.

The dynamic programming algorithm in that link describes how to make the sum match, but you can adapt it to make both the sum and average match. In particular, you have an array $Q[t_1,t_2,i,k]$, which you will fill in with True if there exists a subset of $k$ out of the first $i$ rows whose first column sums to $t_1$ and whose second column sums to $t_2$. You can use dynamic programming to fill in this array (in order of increasing $i$) in $O(N^4M^2)$ time, where $N$ counts the number of rows in your spreadsheet and $M$ is the largest number in the spreadsheet. Then, once you've filled in this array, there will be a selection that sums to $s$ and averages to $a$ if and only if there is some $k$ such that $Q[s,ak,N,k]$ is True.

I recommend implementing this in a programming language (e.g., VBA, not Excel). Implementing it in Excel might be possible but it sounds painful. See https://cs.stackexchange.com/tags/dynamic-programming/info for our reference material on dynamic programming; if you're not familiar with that subject, you might want to read a textbook introduction to it.

Context

StackExchange Computer Science Q#107364, answer score: 2

Revisions (0)

No revisions yet.