patternpythonMinor

Excel's SUMIFS implemented using PANDAS, the Python Data Analysis Library

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

thepandasexcelimplementedanalysisusingpythonlibrarydatasumifs

Problem

I've implemented Excel's SUMIFS function in Pandas using the following code. Is there a better — more Pythonic — implementation?

from pandas import Series, DataFrame
import pandas as pd
df = pd.read_csv('data.csv')
# pandas equivalent of Excel's SUMIFS function
df.groupby('PROJECT').sum().ix['A001']

One concern I have with this implementation is that I'm not explicitly specifying the column to be summed.

Data File

Here's an example CSV data file (data.csv), although I'm displaying | instead of commas to improve the visual appearance.

DATE     | EMPLOYEE     | PROJECT | HOURS
02/01/14 | Smith, John  | A001    | 4.0
02/01/14 | Smith, John  | B002    | 4.0
02/01/14 | Doe, Jane    | A001    | 3.0
02/01/14 | Doe, Jane    | C003    | 5.0
02/02/14 | Smith, John  | B002    | 2.0
02/02/14 | Smith, John  | C003    | 6.0
02/02/14 | Doe, Jane    | A001    | 8.0

Equivalent Excel SUMIFS Function

If I were to open data.csv in Excel and wanted to determine how many hours were worked on project A001, I would use the SUMIFS formula as follows:

=SUMIFS($D2:$D8, $C2:$C8, "A001")

Where the SUMIFS function syntax is:

=SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2,
        criteria2], …)

Solution

The usual approach -- if you want all the projects -- would be

>>> df.groupby("PROJECT")["HOURS"].sum()
PROJECT
A001       15
B002        6
C003       11
Name: HOURS, dtype: float64

This only applies the sum on the desired column, as this constructs an intermediate SeriesGroupBy object:

>>> df.groupby("PROJECT")["HOURS"]

If you're only interested in the total hours of a particular project, then I suppose you could do

>>> df.loc[df.PROJECT == "A001", "HOURS"].sum()
15.0

or if you dislike the repetition of df:

>>> df.query("PROJECT == 'A001'")["HOURS"].sum()
15.0

but I find that I almost always want to be able to access more than one sum, so these are pretty rare patterns in my code.

Aside: .ix has fallen out of favour as it has some confusing behaviour. These days it's recommended to use .loc or .iloc to be explicit.

Code Snippets

>>> df.groupby("PROJECT")["HOURS"].sum()
PROJECT
A001       15
B002        6
C003       11
Name: HOURS, dtype: float64

>>> df.groupby("PROJECT")["HOURS"]
<pandas.core.groupby.SeriesGroupBy object at 0xa94f8cc>

>>> df.loc[df.PROJECT == "A001", "HOURS"].sum()
15.0

>>> df.query("PROJECT == 'A001'")["HOURS"].sum()
15.0

Context

StackExchange Code Review Q#42695, answer score: 5

Revisions (0)

No revisions yet.