patternpythonMinor
Pythonic way to count if a value is 'NaN' in a dictionary of dictionaries
Viewed 0 times
pythonicdictionariesnanwayvaluedictionarycount
Problem
I have a dictionary of dictionaries with several values. I want to count all the entries that do not contain 'NaN' (Not a Number in a string).
I'm not familiar with list comprehensions, but I'm pretty sure there's might be a more pythonic way to achieve this.
Can anyone share pythonic advice?
Thank you
for each in enron_data:
if enron_data[each]["salary"] !='NaN':
counter += 1
if enron_data[each]['email_address']!='NaN':
valid_email +=1I'm not familiar with list comprehensions, but I'm pretty sure there's might be a more pythonic way to achieve this.
Can anyone share pythonic advice?
Thank you
Solution
If you want this to use a comprehension, you need to get the
you need to get the
Which is easier to read. With Python it's the same, and is why good variable names are advised.
And so you'd want to do:
If
And so you could instead use:
I don't think this approach is that much better than your current, but you could make it a function, to reduce code duplication, but whether it's better ultimately comes down to how you're using it.
But it's definitely better than the answer you provided. There's no need to create a new dictionary, it changes memory usage from \$O(1)\$ to \$O(n)\$, is harder to read, and is slower, as it requires more effort to create a dictionary, then to sum booleans.
sum of each each in enron_data, where eachs salary is not 'NaN'. As highlighted by the word exercise, you should notice that each is probably not the best variable name for this. Take the following example:you need to get the
sum of each boat in boats, where the boats cost is not 'NaN'.Which is easier to read. With Python it's the same, and is why good variable names are advised.
And so you'd want to do:
counter += sum(enron_data[each]["salary"] !='NaN' for each in enron_data)
valid_email += sum(enron_data[each]["email_address"] !='NaN' for each in enron_data)If
enron_data has a function like dict.values, or better dict.itervalues, then you'd want to use that instead. The latter one is simply a version of the former with better memory usage.And so you could instead use:
counter += sum(each["salary"] !='NaN' for each in enron_data.itervalues())
valid_email += sum(each["email_address"] !='NaN' for each in enron_data.itervalues())I don't think this approach is that much better than your current, but you could make it a function, to reduce code duplication, but whether it's better ultimately comes down to how you're using it.
But it's definitely better than the answer you provided. There's no need to create a new dictionary, it changes memory usage from \$O(1)\$ to \$O(n)\$, is harder to read, and is slower, as it requires more effort to create a dictionary, then to sum booleans.
Code Snippets
counter += sum(enron_data[each]["salary"] !='NaN' for each in enron_data)
valid_email += sum(enron_data[each]["email_address"] !='NaN' for each in enron_data)counter += sum(each["salary"] !='NaN' for each in enron_data.itervalues())
valid_email += sum(each["email_address"] !='NaN' for each in enron_data.itervalues())Context
StackExchange Code Review Q#157590, answer score: 4
Revisions (0)
No revisions yet.