patternpythonMinor
Applying different equations to a Pandas DataFrame
Viewed 0 times
equationspandasdifferentdataframeapplying
Problem
I wrote a task using pandas and I'm wondering if the code can be optimized. The code pretty much goes as:
All the dataframes are 919 * 919.
I apply a series of formulas and store the results in the
All the dataframes are 919 * 919.
socio is a dataframe with many fields. matrices is a dictionary that holds different dataframes objects.I apply a series of formulas and store the results in the
result dataframe.import numpy as np
frame = [zone for zone in range(taz_start_id, taz_end_id +1)]
#frame = list from 1 to 919
result = DataFrame(0.0, index=frame, columns=frame)
#Get a seires type and apply the results to all the rows of the dataframe by index
temp = np.log(socio["ser_emp"] + 1) * 0.36310718
result = result.apply(lambda x: x + temp[x.index], axis = 0)
#Divide two columns apply a coefficient and fill all the nan to 0. apply results to result dataframe
temp = (socio["hhpop"] / socio["acres"]) * -0.07379568
temp = temp.fillna(0)
result = result.apply(lambda x: x + temp[x.index], axis = 0)
result = (matrices['avgtt'].transpose() * -0.05689183) + result
# set a 1.5 value if dist is between values or 0 if not
result =((matrices["dist"] > 1) & (matrices["dist"] <= 2.5)) * 1.5 + result
# see if each cell is 0 if not set value to exp(value)
result = result.applymap(lambda x: 0 if x == 0 else exp(x))Solution
You don't want to use an awkward
I bundled the two
Next, I just changed to match my style and improve my readability
Finally, the finding where not zero and filling in with
lambda to do the addition. You can use the add method with a parameter axis=0 that specifies which axis to match the lower dimensional argument.I bundled the two
Series you are adding into one so you only have to broadcast across the index once.Next, I just changed to match my style and improve my readability
Finally, the finding where not zero and filling in with
exp, I'm assuming exp is numpy's exp via some import from numpy import exp or something. Otherwise, replace with np.exp. This is much more vectorized. result is a dataframe and mask makes anything that evaluates to True from the passed condition statement into np.nan. I then chain a fillna with a vectorized call to np.exp(result)result = result.add(
np.log(socio.ser_emp.add(1)).mul(.36310718).add(
socio.hhpop.div(socio.acres).mul(-.07379568).fillna(0)
), axis=0)
avgtt, dist = matrices['avgtt'], matrices['dist']
result += avgtt.T * -0.05689183 + ((dist > 1) & (dist <= 2.5)) * 1.5
result = result.mask(result != 0).fillna(exp(result))Code Snippets
result = result.add(
np.log(socio.ser_emp.add(1)).mul(.36310718).add(
socio.hhpop.div(socio.acres).mul(-.07379568).fillna(0)
), axis=0)
avgtt, dist = matrices['avgtt'], matrices['dist']
result += avgtt.T * -0.05689183 + ((dist > 1) & (dist <= 2.5)) * 1.5
result = result.mask(result != 0).fillna(exp(result))Context
StackExchange Code Review Q#151806, answer score: 4
Revisions (0)
No revisions yet.