HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

2-sum problem on a range in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pythonproblemrangesum

Problem

Here is the problem description (from an algorithm course: https://www.coursera.org/learn/algorithm-design-analysis):


The goal of this problem is to implement a variant of the 2-SUM
algorithm (covered in the Week 6 lecture on hash table applications).


The file contains 1 million integers, both positive and negative
(there might be some repetitions!). This is your array of integers,
with the \$i\$th row of the file specifying the \$i\$th entry of the array.


Your task is to compute the number of target values \$t\$ in the interval
[-10000,10000] (inclusive) such that there are distinct numbers \$x,y\$ in
the input file that satisfy \$x+y=t\$.

I wrote the following Python program. It runs already 25 minutes and just calculated 6200 of 20000 numbers.
This is too slow.

import sys

def has_two_sum(n,num_set,nums):
  res = any(((n-x) in num_set) and 2*x !=n for x in nums)
  return res

with open(sys.argv[1], 'r') as fp:
  nums = [int(line) for line in fp]
  num_set = set(nums)

two_sum = sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))
print(two_sum)

Solution

Your code runs instantly on my system with 1 million randomly generated numbers. are you sure you are searching for n - x in num_set and not in nums?

Anyway, a couple of Python style pointers:

-
You can iterate over the set directly, no need to pass both the set and the list:

def has_two_sum(n, num_set):
    res = any(((n-x) in num_set) and 2*x != n for x in num_set)
    return res


-
Not much point in storing a value to immediuately return it:

def has_two_sum(n, num_set):
    return any(((n-x) in num_set) and 2*x != n for x in num_set)


-
Boolean values are converted to 0 or 1 when used in an integer setting, so your adding of values is typically written as:

two_sum = sum(has_two_sum(n, num_set) for n in range(-10000, 10001))


-
If you are going to have a function that contains a single liner, you better give it a great, descriptive name. I don't think that has_two_sum really fits the bill. Perhaps has_two_items_that_sum_to_n or has_two_that_sum_to_n? Alternatively, I think my preferred option here would be to get rid of the function altogether and let code speak by itself, writing the whole thing as two nested comprehensions:

two_sum = sum(any(n - x in num_set and 2*x != n for x in num_set)
              for n in range(-10000, 10001))


There is a school of thought that considers nested comprehensions to be confusing (e.g. the Google Python style guide doesn't allow them), but I think this case is simple enough, YMMV.

Code Snippets

def has_two_sum(n, num_set):
    res = any(((n-x) in num_set) and 2*x != n for x in num_set)
    return res
def has_two_sum(n, num_set):
    return any(((n-x) in num_set) and 2*x != n for x in num_set)
two_sum = sum(has_two_sum(n, num_set) for n in range(-10000, 10001))
two_sum = sum(any(n - x in num_set and 2*x != n for x in num_set)
              for n in range(-10000, 10001))

Context

StackExchange Code Review Q#135392, answer score: 8

Revisions (0)

No revisions yet.