HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

sparse vector dot product

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
productsparsevectordot

Problem

I'm trying to implement a sparse vector (most elements are zero) dot product calculation. My major idea is to represent each sparse vector as a list (which holds only non-zero dimensions), and each element in the list is a 2-dimensional tuple -- where first dimension is index of vector, and 2nd dimension is its related value.

Any comments about my code improvement, bugs, etc. are appreciated. I also welcome any good advice for in general how to best implement dot product for sparse vector using Python.

def dot_product(v1, v2):
    i1 = 0
    i2 = 0
    result = 0.0
    while i1  v2[i2][0]:
            i2 += 1
        else:
            i1 += 1

    return result
if __name__ == "__main__":
    v1 = [(0, 2), (1, 4), (5, 6)] # (2,4,0,0,0,6)
    v2 = [(1,3),(2,4),(5,7)] #(0,3,4,0,0,7)
    print dot_product(v1, v2)

Solution

Valid. But your code is messy. You are taking one step away from the original traditional problem, and that actually becomes a problem form your code.

Your actually better off with the one dimensional arrays then the two dimensional tuples. Tuples may be the wrong data type to use for your problem to resolve itself nicely.

Styling notes:
This:

def dot_product(v1, v2):
    ...


suggest to me that v1 and v2 are vectors, not a list of vectors, but by changing to:

def dot_product1(v1: [()], v2: [()]):
    ...


the obscurity magically disappears.

This method of commenting your code is called annotations and is commonly used in python, I like them a lot.

The

...
    return result
if __name__ == "__main__":
    ...


could be more readable if:

...
    return result

if __name__ == "__main__":
    ...


An analogue solution which will scale better is

def get_pointers(v: tuple):
    return {key: value for key, value in enumerate(v) if value}

def sparse_vector_dot(a: dict, b: dict):
    result = 0
    for key, value in a.items():
        if key in b:
            result += b[key] * value
    return result

def main():
    a = (2, 4, 0, 0, 0, 6)
    b = (0, 3, 4, 0, 0, 7)
    a, b = get_pointers(a), get_pointers(b)
    print(sparse_vector_dot(a, b))


and the function sparse_vector_dot could be written as:

def sparse_vector_dot_oneliner(a: dict, b: dict):
    return sum(b[key]*value for key, value in a.items() if key in b)


It will scale better since we are never checking what we don't have to check, the only thing we have to check is if there is a corresponding index in the other list (now dict).

Code Snippets

def dot_product(v1, v2):
    ...
def dot_product1(v1: [()], v2: [()]):
    ...
...
    return result
if __name__ == "__main__":
    ...
...
    return result


if __name__ == "__main__":
    ...
def get_pointers(v: tuple):
    return {key: value for key, value in enumerate(v) if value}


def sparse_vector_dot(a: dict, b: dict):
    result = 0
    for key, value in a.items():
        if key in b:
            result += b[key] * value
    return result


def main():
    a = (2, 4, 0, 0, 0, 6)
    b = (0, 3, 4, 0, 0, 7)
    a, b = get_pointers(a), get_pointers(b)
    print(sparse_vector_dot(a, b))

Context

StackExchange Code Review Q#148717, answer score: 4

Revisions (0)

No revisions yet.