HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Plotting Frequency of Vowels in a Paragraph

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
frequencyplottingparagraphvowels

Problem

I wanted to play around with the following idea:


Give a paragraph, I wanted to find out the relative frequency of usage
of each of the five vowels. I wanted to plot a pie chart depicting
this.

Here is my code:

from sys import stdin
from re import sub
import pylab
DataToRead = stdin.read()
VowelData = sub(r"[^aeiou]","",DataToRead) #Take Out Everything which is NOT a vowel
VowelList = ['a','e','i','o','u']
VowelCount = [sum(map(lambda x: x==Vowel,VowelData)) for Vowel in VowelList] # Number of times each vowel appears
print VowelCount
VowelPerc = [x*100.0/sum(VowelCount) for x in VowelCount] #Find the percentage of each
pylab.pie(VowelPerc,None,VowelList,autopct='%1.1f%%')
pylab.show()


It does what it needs to (I think) but I feel there is a better way of doing this (algorithmically speaking). I like Functional Programming so I threw in the Mapper but there is no other reason for that.

Solution

I strongly advise against CamelCase for your variable names – consider snake_case instead. Why? Because consistency with existing Python code.

I also suggest better spacing around your operators, e.g. x == vowel instead of x==Vowel and pylab.pie(vowel_perc, None, vowel_list, autopct='%1.1f%%') instead of pylab.pie(VowelPerc,None,VowelList,autopct='%1.1f%%'). In general, consistent, even spacing makes code easier to read.

Do not use map, or more precisely: do not use lambdas. Python has list comprehensions which are exactly as powerful as map and filter, but are considered to be more readable. Here, your line

VowelCount = [sum(map(lambda x: x==Vowel,VowelData)) for Vowel in VowelList]


would become

count = [sum([x for x in vowel_data if x == vowel]) for vowel in vowel_list]


Of course, that is far from optimal as we make five passes through the data. If we use a dictionary, we can reduce this to a single pass:

vowels = "aeiou"
counts = dict([(vowel, 0) for vowel in vowels])
for x in data:
    counts[x] += 1
percentages = [counts[vowel] * 100.0 / len(data) for vowel in vowels]


Note that I got rid of the unnecessary vowel_ prefix here, and that I replaced sum(VowelCount) with the likely cheaper len(data) (but with all “optimizations”, the improvement should be benchmarked, not guessed).

As per the pie-plot documentation, you should consider setting the aspect ratio of your axes, so that the resulting chart doesn't look distorted.

Code Snippets

VowelCount = [sum(map(lambda x: x==Vowel,VowelData)) for Vowel in VowelList]
count = [sum([x for x in vowel_data if x == vowel]) for vowel in vowel_list]
vowels = "aeiou"
counts = dict([(vowel, 0) for vowel in vowels])
for x in data:
    counts[x] += 1
percentages = [counts[vowel] * 100.0 / len(data) for vowel in vowels]

Context

StackExchange Code Review Q#43118, answer score: 6

Revisions (0)

No revisions yet.