patternpythonMinor
Subtract multiple columns in PANDAS DataFrame by a series (single column)
Viewed 0 times
pandascolumnssubtractcolumnseriessinglemultipledataframe
Problem
Background
I have tons of very large pandas DataFrames that need to be normalized with the following operation; log2(data) - mean(log2(data))
Example Data
The example DataFrame
Question
I have tried to perform the normalization operation noted above many different ways however the following code snippet is the only one that I have gotten to work;
As you can see I'm converting the DataFrame to a numpy array and transposing it just so I can subtract by the mean of the data. I then have to transpose the resulting array then reconstitute it as a DataFrame. Is there a simpler way to do all of this?
I have tons of very large pandas DataFrames that need to be normalized with the following operation; log2(data) - mean(log2(data))
Example Data
The example DataFrame
my_df looks like this;iovrrx nfinsu mvdfjc idjges fubmrg lvuhfv
0 0.987654 0.206104 0.802920 0.011157 0.860618 0.575871
1 0.706397 0.860083 0.939230 0.436194 0.557081 0.706964
2 0.043139 0.729435 0.597488 0.700998 0.974193 0.917758
3 0.316080 0.461547 0.844540 0.510143 0.908475 0.877330
4 0.828839 0.177670 0.610833 0.328238 0.327697 0.689756Question
I have tried to perform the normalization operation noted above many different ways however the following code snippet is the only one that I have gotten to work;
log_div_ave = my_df.apply(np.log2).values.T - my_df.apply(np.log2).mean(axis=1).values
log_div_ave = pd.DataFrame(log_div_ave.T,columns=my_df.columns)
print(log_div_ave)
iovrrx nfinsu mvdfjc idjges fubmrg lvuhfv
0 1.667378 -0.593258 1.368628 -4.800610 1.468744 0.889117
1 0.056992 0.340988 0.467991 -0.638518 -0.285601 0.058149
2 -3.467018 0.612699 0.324830 0.555330 1.030127 0.944032
3 -0.941776 -0.395590 0.476099 -0.251165 0.581380 0.531053
4 0.933714 -1.288174 0.493400 -0.402633 -0.405015 0.668708As you can see I'm converting the DataFrame to a numpy array and transposing it just so I can subtract by the mean of the data. I then have to transpose the resulting array then reconstitute it as a DataFrame. Is there a simpler way to do all of this?
Solution
There's need to transpose. You can subtract along any axis you want on a
First, take the log base 2 of your dataframe,
Store the log base 2 dataframe so you can use its
Finally subtract along the
DataFrame using its subtract method.First, take the log base 2 of your dataframe,
apply is fine but you can pass a DataFrame to numpy functions. Store the log base 2 dataframe so you can use its
subtract method. You can also reuse this dataframe when you take the mean of each row.Finally subtract along the
index axis for each column of the log2 dataframe, subtract the matching mean.log2df = np.log2(my_df)
log2mean = log2df.mean(axis='columns')
log_div_ave = log2df.subtract(log2mean, axis='index')Code Snippets
log2df = np.log2(my_df)
log2mean = log2df.mean(axis='columns')
log_div_ave = log2df.subtract(log2mean, axis='index')Context
StackExchange Code Review Q#156447, answer score: 3
Revisions (0)
No revisions yet.