HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Applying different equations to a Pandas DataFrame

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
equationspandasdifferentdataframeapplying

Problem

I wrote a task using pandas and I'm wondering if the code can be optimized. The code pretty much goes as:

All the dataframes are 919 * 919. socio is a dataframe with many fields. matrices is a dictionary that holds different dataframes objects.

I apply a series of formulas and store the results in the result dataframe.

import numpy as np

frame = [zone for zone in range(taz_start_id, taz_end_id +1)]    
#frame = list from 1 to 919

result = DataFrame(0.0, index=frame, columns=frame)

#Get a seires type and apply the results to all the rows of the dataframe by index
temp = np.log(socio["ser_emp"] + 1) * 0.36310718
result = result.apply(lambda x:  x + temp[x.index], axis = 0)

#Divide two columns apply a coefficient and fill all the nan to 0. apply results to result dataframe 
temp = (socio["hhpop"] / socio["acres"]) * -0.07379568
temp = temp.fillna(0)
result = result.apply(lambda x:  x + temp[x.index], axis = 0)    

result = (matrices['avgtt'].transpose() * -0.05689183) + result    

# set a 1.5 value if dist is between values or 0 if not
result =((matrices["dist"] > 1) & (matrices["dist"] <= 2.5))  * 1.5 + result

# see if each cell is 0 if not set value to exp(value)
result = result.applymap(lambda x: 0 if x == 0 else exp(x))

Solution

You don't want to use an awkward lambda to do the addition. You can use the add method with a parameter axis=0 that specifies which axis to match the lower dimensional argument.

I bundled the two Series you are adding into one so you only have to broadcast across the index once.

Next, I just changed to match my style and improve my readability

Finally, the finding where not zero and filling in with exp, I'm assuming exp is numpy's exp via some import from numpy import exp or something. Otherwise, replace with np.exp. This is much more vectorized. result is a dataframe and mask makes anything that evaluates to True from the passed condition statement into np.nan. I then chain a fillna with a vectorized call to np.exp(result)

result = result.add(
    np.log(socio.ser_emp.add(1)).mul(.36310718).add(
        socio.hhpop.div(socio.acres).mul(-.07379568).fillna(0)
    ), axis=0)

avgtt, dist = matrices['avgtt'], matrices['dist']
result += avgtt.T * -0.05689183 + ((dist > 1) & (dist <= 2.5)) * 1.5

result = result.mask(result != 0).fillna(exp(result))

Code Snippets

result = result.add(
    np.log(socio.ser_emp.add(1)).mul(.36310718).add(
        socio.hhpop.div(socio.acres).mul(-.07379568).fillna(0)
    ), axis=0)

avgtt, dist = matrices['avgtt'], matrices['dist']
result += avgtt.T * -0.05689183 + ((dist > 1) & (dist <= 2.5)) * 1.5

result = result.mask(result != 0).fillna(exp(result))

Context

StackExchange Code Review Q#151806, answer score: 4

Revisions (0)

No revisions yet.