HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Pythonic way to count if a value is 'NaN' in a dictionary of dictionaries

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pythonicdictionariesnanwayvaluedictionarycount

Problem

I have a dictionary of dictionaries with several values. I want to count all the entries that do not contain 'NaN' (Not a Number in a string).

for each in enron_data:
    if enron_data[each]["salary"] !='NaN':
        counter += 1

    if enron_data[each]['email_address']!='NaN':
        valid_email +=1


I'm not familiar with list comprehensions, but I'm pretty sure there's might be a more pythonic way to achieve this.

Can anyone share pythonic advice?

Thank you

Solution

If you want this to use a comprehension, you need to get the sum of each each in enron_data, where eachs salary is not 'NaN'. As highlighted by the word exercise, you should notice that each is probably not the best variable name for this. Take the following example:


you need to get the sum of each boat in boats, where the boats cost is not 'NaN'.

Which is easier to read. With Python it's the same, and is why good variable names are advised.

And so you'd want to do:

counter += sum(enron_data[each]["salary"] !='NaN' for each in enron_data)
valid_email += sum(enron_data[each]["email_address"] !='NaN' for each in enron_data)


If enron_data has a function like dict.values, or better dict.itervalues, then you'd want to use that instead. The latter one is simply a version of the former with better memory usage.

And so you could instead use:

counter += sum(each["salary"] !='NaN' for each in enron_data.itervalues())
valid_email += sum(each["email_address"] !='NaN' for each in enron_data.itervalues())


I don't think this approach is that much better than your current, but you could make it a function, to reduce code duplication, but whether it's better ultimately comes down to how you're using it.

But it's definitely better than the answer you provided. There's no need to create a new dictionary, it changes memory usage from \$O(1)\$ to \$O(n)\$, is harder to read, and is slower, as it requires more effort to create a dictionary, then to sum booleans.

Code Snippets

counter += sum(enron_data[each]["salary"] !='NaN' for each in enron_data)
valid_email += sum(enron_data[each]["email_address"] !='NaN' for each in enron_data)
counter += sum(each["salary"] !='NaN' for each in enron_data.itervalues())
valid_email += sum(each["email_address"] !='NaN' for each in enron_data.itervalues())

Context

StackExchange Code Review Q#157590, answer score: 4

Revisions (0)

No revisions yet.