snippetpythonCriticalCanonical
How to add a new column to an existing DataFrame
Viewed 0 times
columnhowexistingdataframeaddnew
Problem
I have the following indexed DataFrame with named columns and rows not- continuous numbers:
I would like to add a new column,
I tried different versions of
How can I add column
a b c d
2 0.671399 0.101208 -0.181532 0.241273
3 0.446172 -0.243316 0.051767 1.577318
5 0.614758 0.075793 -0.451460 -0.012493
I would like to add a new column,
'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).0 -0.335485
1 -1.166658
2 -0.385571
dtype: float64
I tried different versions of
join, append, merge, but I did not get the result I wanted, only errors at most.How can I add column
e to the above example?Solution
Edit 2017
As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using
Edit 2015
Some reported getting the
However, the code still runs perfectly with the current pandas version 0.16.1.
The
In fact, this is currently the more efficient method as described in pandas docs
Original answer:
Use the original df1 indexes to create the series:
As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using
assign:df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)Edit 2015
Some reported getting the
SettingWithCopyWarning with this code.However, the code still runs perfectly with the current pandas version 0.16.1.
>>> sLength = len(df1['a'])
>>> df1
a b c d
6 -0.269221 -0.026476 0.997517 1.294385
8 0.917438 0.847941 0.034235 -0.448948
>>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
a b c d e
6 -0.269221 -0.026476 0.997517 1.294385 1.757167
8 0.917438 0.847941 0.034235 -0.448948 2.228131
>>> pd.version.short_version
'0.16.1'The
SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
a b c d e f
6 -0.269221 -0.026476 0.997517 1.294385 1.757167 -0.050927
8 0.917438 0.847941 0.034235 -0.448948 2.228131 0.006109
>>>In fact, this is currently the more efficient method as described in pandas docs
Original answer:
Use the original df1 indexes to create the series:
df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)Code Snippets
df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)>>> sLength = len(df1['a'])
>>> df1
a b c d
6 -0.269221 -0.026476 0.997517 1.294385
8 0.917438 0.847941 0.034235 -0.448948
>>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
a b c d e
6 -0.269221 -0.026476 0.997517 1.294385 1.757167
8 0.917438 0.847941 0.034235 -0.448948 2.228131
>>> pd.version.short_version
'0.16.1'>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
a b c d e f
6 -0.269221 -0.026476 0.997517 1.294385 1.757167 -0.050927
8 0.917438 0.847941 0.034235 -0.448948 2.228131 0.006109
>>>df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)Context
Stack Overflow Q#12555323, score: 1322
Revisions (0)
No revisions yet.