HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Averaging a set of percentages between certain important levels

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
averagingbetweenlevelsimportantcertainsetpercentages

Problem

I have a spreadsheet, called "To Do", with set of percentages in column E and importance levels in column A:

On another spreadsheet, I'm entering a formula to calculate the average of these percentages, but only those between certain importance levels:

See it on googledocs

This is the formula in question, indented for easier reading:

=TO_PERCENT(
     DIVIDE(
         SUMIF('To Do'!A3:A, "<2000", 'To Do'!E3:E) 
       - SUMIF('To Do'!A3:A, "<=1000", 'To Do'!E3:E), 
         MAX(
             COUNTIF('To Do'!A3:A, "<2000") 
           - COUNTIF('To Do'!A3:A, "<=1000"), 
             1
         )
     )
 )


Is there any better way to do this?

I originally wanted to use SUMIFS and COUNTIFS, but it said:


error: Unknown function name

Solution

As @200_success pointed out in his comment, averaging percentages is a suspicious mathematical technique.

Each task in the TODO list could have its own weight, or relative value - a number that represents a chunk of progress towards "done"; then you can calculate a percentage by adding up the relative weights of all completed tasks (or more accurately, of a value derived from that task's %completed and weight), and dividing by the sum of the weights of all tasks.

I haven't played much with google-sheets, but I know excel pretty well; the formula you've come up with is exactly the kind of formula I'd have used in Excel 2003, before SUMIFS and COUNTIFS were introduced in Excel 2007.

When I enter a formula in a spreadsheet, I want to be able to copy that formula over to the next cell, without having to modify it. This involves a number of principles:

  • Don't hard-code cell references. In Excel I would have used names and/or tables - not sure google-sheets supports that, but in any case if none of that is supported you can still, and should, use absolute cell references - refer to 'To Do'!$A:$A and 'To Do'!$E:$E. Not sure what A3:A refers to.



  • Don't hard-code your variables. If each column is going to use a different set of [Priority] values, "



Lastly, I find

MAX(COUNTIF('To Do'!A3:A, "<2000") - COUNTIF('To Do'!A3:A, "<=1000"), 1)


is easier to read as

MAX(1, COUNTIF('To Do'!A3:A, "<2000") - COUNTIF('To Do'!A3:A, "<=1000"))


But the reason you're taking the
MAX here, is to avoid a division by zero; by dividing by 1 whenever that's the case, you're showing mathematically incorrect results.

In Excel I'd wrap the division with an
IFERROR, and return a string such as "-" when I'm dividing by zero. This accurately reports "this category is irrelevant", rather than "this category is 100% done".

Another simple option would be to add a column in the TODO list, to identify the actual priority level (which you currently have as a wasted row between each group); then a simple
SUMIF / COUNTIF can do the trick: you only account for rows with a given [PriorityCode] value.

That's actually a "smell": your
[Priority]` values currently encode two values: the priority group, and the priority level, within that group.

Code Snippets

MAX(COUNTIF('To Do'!A3:A, "<2000") - COUNTIF('To Do'!A3:A, "<=1000"), 1)
MAX(1, COUNTIF('To Do'!A3:A, "<2000") - COUNTIF('To Do'!A3:A, "<=1000"))

Context

StackExchange Code Review Q#55759, answer score: 7

Revisions (0)

No revisions yet.