patternpythonCriticalCanonical
How are iloc and loc different?
Viewed 0 times
howareandiloclocdifferent
Problem
Can someone explain how these two methods of slicing are different? I've seen the docs
and I've seen previous similar questions (1, 2), but I still find myself unable to understand how they are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.
For example, say we want to get the first five rows of a
Can someone present cases where the distinction in uses are clearer?
Once upon a time, I also wanted to know how these two functions differed from
and I've seen previous similar questions (1, 2), but I still find myself unable to understand how they are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.
For example, say we want to get the first five rows of a
DataFrame. How is it that these two work?df.loc[:5]
df.iloc[:5]
Can someone present cases where the distinction in uses are clearer?
Once upon a time, I also wanted to know how these two functions differed from
df.ix[:5] but ix has been removed from pandas 1.0, so I don't care anymore.Solution
Label vs. Location
The main distinction between the two methods is:
-
-
To demonstrate, consider a series
Here are some of the differences/similarities between
description
single item
Value at index label
Value at index location 0 (the string
slice
Two rows (labels
One row (first row at location 0)
slice with out-of-bounds end
Zero rows (empty Series)
Five rows (location 1 onwards)
slice with negative step
three rows (labels
Zero rows (empty Series)
integer list
Two rows with given labels
Two rows with given locations
Bool series (indicating which values have the property)
One row (containing
Bool array
One row (containing
Same as
int object not in index
int object not in index
Returns last value in
callable applied to series (here returning 3rd item in index)
Here's a Series where the index contains string objects:
Since
For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example:
Then to fetch the row(s) for March/April 2021 we only need:
Rows and Columns
When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.
Consider the DataFrame defined below:
Then for example:
Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of
For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?
We can achieve this result using
The main distinction between the two methods is:
-
loc gets rows (and/or columns) with particular labels.-
iloc gets rows (and/or columns) at integer locations.To demonstrate, consider a series
s of characters with a non-monotonic integer index:>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2])
49 a
48 b
47 c
0 d
1 e
2 f
>>> s.loc[0] # value at index label 0
'd'
>>> s.iloc[0] # value at index location 0
'a'
>>> s.loc[0:1] # rows at index labels between 0 and 1 (inclusive)
0 d
1 e
>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49 aHere are some of the differences/similarities between
s.loc and s.iloc when passed various objects:description
s.loc[]s.iloc[]0single item
Value at index label
0 (the string 'd')Value at index location 0 (the string
'a')0:1slice
Two rows (labels
0 and 1)One row (first row at location 0)
1:47slice with out-of-bounds end
Zero rows (empty Series)
Five rows (location 1 onwards)
1:47:-1slice with negative step
three rows (labels
1 back to 47)Zero rows (empty Series)
[2, 0]integer list
Two rows with given labels
Two rows with given locations
s > 'e'Bool series (indicating which values have the property)
One row (containing
'f')NotImplementedError(s>'e').valuesBool array
One row (containing
'f')Same as
loc999int object not in index
KeyErrorIndexError (out of bounds)-1int object not in index
KeyErrorReturns last value in
slambda x: x.index[3]callable applied to series (here returning 3rd item in index)
s.loc[s.index[3]]s.iloc[s.index[3]]loc's label-querying capabilities extend well-beyond integer indexes and it's worth highlighting a couple of additional examples.Here's a Series where the index contains string objects:
>>> s2 = pd.Series(s.index, index=s.values)
>>> s2
a 49
b 48
c 47
d 0
e 1
f 2Since
loc is label-based, it can fetch the first value in the Series using s2.loc['a']. It can also slice with non-integer objects:>>> s2.loc['c':'e'] # all rows lying between 'c' and 'e' (inclusive)
c 47
d 0
e 1For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example:
>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M'))
>>> s3
2021-01-31 16:41:31.879768 a
2021-02-28 16:41:31.879768 b
2021-03-31 16:41:31.879768 c
2021-04-30 16:41:31.879768 d
2021-05-31 16:41:31.879768 eThen to fetch the row(s) for March/April 2021 we only need:
>>> s3.loc['2021-03':'2021-04']
2021-03-31 17:04:30.742316 c
2021-04-30 17:04:30.742316 dRows and Columns
loc and iloc work the same way with DataFrames as they do with Series. It's useful to note that both methods can address columns and rows together.When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.
Consider the DataFrame defined below:
>>> import numpy as np
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a 0 1 2 3 4
b 5 6 7 8 9
c 10 11 12 13 14
d 15 16 17 18 19
e 20 21 22 23 24Then for example:
>>> df.loc['c': , :'z'] # rows 'c' and onwards AND columns up to 'z'
x y z
c 10 11 12
d 15 16 17
e 20 21 22
>>> df.iloc[:, 3] # all rows, but only the column at index location 3
a 3
b 8
c 13
d 18
e 23Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of
loc and iloc.For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?
>>> import numpy as np
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a 0 1 2 3 4
b 5 6 7 8 9
c 10 11 12 13 14
d 15 16 17 18 19
e 20 21 22 23 24We can achieve this result using
iloc and the help of another method:>>> df.iloc[:df.index.get_loc('c') + 1, :4]
x y z 8
a 0 1 2 3
b 5 6 7 8
c 10 11 12 13get_loc() is an index method meaning "get the position of the label in this index". Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.Code Snippets
>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2])
49 a
48 b
47 c
0 d
1 e
2 f
>>> s.loc[0] # value at index label 0
'd'
>>> s.iloc[0] # value at index location 0
'a'
>>> s.loc[0:1] # rows at index labels between 0 and 1 (inclusive)
0 d
1 e
>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49 a>>> s2 = pd.Series(s.index, index=s.values)
>>> s2
a 49
b 48
c 47
d 0
e 1
f 2>>> s2.loc['c':'e'] # all rows lying between 'c' and 'e' (inclusive)
c 47
d 0
e 1>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M'))
>>> s3
2021-01-31 16:41:31.879768 a
2021-02-28 16:41:31.879768 b
2021-03-31 16:41:31.879768 c
2021-04-30 16:41:31.879768 d
2021-05-31 16:41:31.879768 e>>> s3.loc['2021-03':'2021-04']
2021-03-31 17:04:30.742316 c
2021-04-30 17:04:30.742316 dContext
Stack Overflow Q#31593201, score: 1608
Revisions (0)
No revisions yet.