HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Applying a formula to 2D numpy arrays row-wise

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
arraysnumpywiseapplyingrowformula

Problem

The user has two 2D input arrays A and B, and a given matrix S. He wants to apply a
complicated formula to these arrays row-wise to get C. Something like: $$C_i = f(S, A_i, B_i)$$
where f is some complicated function, implemented by the user. That is, the user wants to supply his complicated formula in terms of the row vectors, and whatever additional data is necessary for that formula. The implementation of the formula must be a function.

For the sake of this example only, the complicated formula will be
the dot product, and the "additional data" for the formula will be the
identity matrix. The real application is a lot more complicated.

My question is: How can I express the line

C = np.fromiter(map(partial(users_formula, S), A, B), dtype=np.float64)


in a cleaner way in Numpy? Speed or memory consumption is not a major concern, but code readability is. I suspect that there is a better way to do it in Numpy.

```
from __future__ import print_function
from functools import partial
import numpy as np

def main():
# Some dummy data just for testing purposes
A = np.array([[-0.486978, 0.810468, 0.325568],
[-0.640856, 0.640856, 0.422618],
[-0.698328, 0.628777, 0.34202 ],
[-0.607665, 0.651641, 0.45399 ]])
B = np.array([[ 0.075083, 0.41022 , -0.908891],
[-0.025583, 0.532392, -0.846111],
[ 0.014998, 0.490579, -0.871268],
[-0.231477, 0.401497, -0.886125]])
S = np.identity(3)
#---------------------------------------------------------------
# The problematic line is below. What is the proper way to
# express this in Numpy?
C = np.fromiter(map(partial(users_formula, S), A, B), dtype=np.float64)
assert np.allclose(C, 0.0, atol=1.0e-6), C
print('Done!')

def users_formula(S, a, b):
# a == A_i, b == B_i
# In the real application, the user gives his complicated
# formula here. The matrix

Solution

Here's some alternatives using your arrays:

Yours, for reference:

In [19]: np.fromiter(map(partial(users_formula, S), A, B), dtype=np.float64)Out[19]: 
array([  5.88698000e-07,  -1.11998000e-07,   1.87179000e-07,
         4.89032000e-07])


List comprehension with row indexing

In [20]: np.array([users_formula(S,A[i],B[i]) for i in range(A.shape[0])])
Out[20]: 
array([  5.88698000e-07,  -1.11998000e-07,   1.87179000e-07,
         4.89032000e-07])


Make the row indexing a bit more explicit - longer, but clearer

In [21]: np.array([users_formula(S,A[i,:],B[i,:]) for i in range(A.shape[0])])
Out[21]: 
array([  5.88698000e-07,  -1.11998000e-07,   1.87179000e-07,
         4.89032000e-07])


Replace indexing with good old Python zip (this my personal favorite for readability).

In [22]: np.array([users_formula(S,a,b) for a,b in zip(A,B)])
Out[22]: 
array([  5.88698000e-07,  -1.11998000e-07,   1.87179000e-07,
         4.89032000e-07])


Even though I'm generally familiar with apply_along_axis I'm having problems with its application. Even if I get it right, that's not a good sign. It may be the most compact, but it clearly won't be the clearest.

In [23]: np.apply_along_axis(partial(users_formula,S),0,A,B)------------------    ...
ValueError: matrices are not aligned


There some other functions to explore, such as np.vectorize and np.frompyfunc. But they'll have the same problem - I'd have to study the docs and experiment to get a working example.

The problem with apply_along_axis is that it's designed to iterate over one array, not several.

apply_along_axis(func1d,axis,arr,*args)
apply_along_axis(...,0, A, B)


This would iterate on the rows of A, but use the whole B. S could be passed as *args. But to use both A and B, I'd have to concatenate them into one array, and then change your function to handle 'rows' from that. MESSY.

Internally, apply_along_axis is just a generalization of:

outarray=np.empty(A.shape[0],A.dtype)
for i in range(A.shape[0]):
    outarray[i] = users_formula(S,A[i,:],B[i,:])

Code Snippets

In [19]: np.fromiter(map(partial(users_formula, S), A, B), dtype=np.float64)Out[19]: 
array([  5.88698000e-07,  -1.11998000e-07,   1.87179000e-07,
         4.89032000e-07])
In [20]: np.array([users_formula(S,A[i],B[i]) for i in range(A.shape[0])])
Out[20]: 
array([  5.88698000e-07,  -1.11998000e-07,   1.87179000e-07,
         4.89032000e-07])
In [21]: np.array([users_formula(S,A[i,:],B[i,:]) for i in range(A.shape[0])])
Out[21]: 
array([  5.88698000e-07,  -1.11998000e-07,   1.87179000e-07,
         4.89032000e-07])
In [22]: np.array([users_formula(S,a,b) for a,b in zip(A,B)])
Out[22]: 
array([  5.88698000e-07,  -1.11998000e-07,   1.87179000e-07,
         4.89032000e-07])
In [23]: np.apply_along_axis(partial(users_formula,S),0,A,B)------------------    ...
ValueError: matrices are not aligned

Context

StackExchange Code Review Q#109479, answer score: 4

Revisions (0)

No revisions yet.