snippetpythonCritical
How do I select rows from a DataFrame based on column values?
Viewed 0 times
howvaluesrowsselectcolumndataframebasedfrom
Problem
How can I select rows from a DataFrame based on values in some column in Pandas?
In SQL, I would use:
In SQL, I would use:
SELECT *
FROM table
WHERE column_name = some_value
Solution
To select rows whose column value equals a scalar,
To select rows whose column value is in an iterable,
Combine multiple conditions with
Note the parentheses. Due to Python's operator precedence rules,
is parsed as
which results in a Truth value of a Series is ambiguous error.
To select rows whose column value does not equal
The
For example,
yields
If you have multiple values you want to include, put them in a
list (or more generally, any iterable) and use
yields
Note, however, that if you wish to do this many times, it is more efficient to
make an index first, and then use
yields
or, to include multiple values from the index use
yields
some_value, use ==:df.loc[df['column_name'] == some_value]To select rows whose column value is in an iterable,
some_values, use isin:df.loc[df['column_name'].isin(some_values)]Combine multiple conditions with
&:df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]Note the parentheses. Due to Python's operator precedence rules,
& binds more tightly than =. Thus, the parentheses in the last example are necessary. Without the parenthesesdf['column_name'] >= A & df['column_name'] <= Bis parsed as
df['column_name'] >= (A & df['column_name']) <= Bwhich results in a Truth value of a Series is ambiguous error.
To select rows whose column value does not equal
some_value, use !=:df.loc[df['column_name'] != some_value]The
isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~:df = df.loc[~df['column_name'].isin(some_values)] # .loc is not in-place replacementFor example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
# A B C D
# 0 foo one 0 0
# 1 bar one 1 2
# 2 foo two 2 4
# 3 bar three 3 6
# 4 foo two 4 8
# 5 bar two 5 10
# 6 foo one 6 12
# 7 foo three 7 14
print(df.loc[df['A'] == 'foo'])yields
A B C D
0 foo one 0 0
2 foo two 2 4
4 foo two 4 8
6 foo one 6 12
7 foo three 7 14If you have multiple values you want to include, put them in a
list (or more generally, any iterable) and use
isin:print(df.loc[df['B'].isin(['one','three'])])yields
A B C D
0 foo one 0 0
1 bar one 1 2
3 bar three 3 6
6 foo one 6 12
7 foo three 7 14Note, however, that if you wish to do this many times, it is more efficient to
make an index first, and then use
df.loc:df = df.set_index(['B'])
print(df.loc['one'])yields
A C D
B
one foo 0 0
one bar 1 2
one foo 6 12or, to include multiple values from the index use
df.index.isin:df.loc[df.index.isin(['one','two'])]yields
A C D
B
one foo 0 0
one bar 1 2
two foo 2 4
two foo 4 8
two bar 5 10
one foo 6 12Code Snippets
df.loc[df['column_name'] == some_value]df.loc[df['column_name'].isin(some_values)]df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]df['column_name'] >= A & df['column_name'] <= Bdf['column_name'] >= (A & df['column_name']) <= BContext
Stack Overflow Q#17071871, score: 6647
Revisions (0)
No revisions yet.