patternpythonMinor
Plotting Frequency of Vowels in a Paragraph
Viewed 0 times
frequencyplottingparagraphvowels
Problem
I wanted to play around with the following idea:
Give a paragraph, I wanted to find out the relative frequency of usage
of each of the five vowels. I wanted to plot a pie chart depicting
this.
Here is my code:
It does what it needs to (I think) but I feel there is a better way of doing this (algorithmically speaking). I like Functional Programming so I threw in the Mapper but there is no other reason for that.
Give a paragraph, I wanted to find out the relative frequency of usage
of each of the five vowels. I wanted to plot a pie chart depicting
this.
Here is my code:
from sys import stdin
from re import sub
import pylab
DataToRead = stdin.read()
VowelData = sub(r"[^aeiou]","",DataToRead) #Take Out Everything which is NOT a vowel
VowelList = ['a','e','i','o','u']
VowelCount = [sum(map(lambda x: x==Vowel,VowelData)) for Vowel in VowelList] # Number of times each vowel appears
print VowelCount
VowelPerc = [x*100.0/sum(VowelCount) for x in VowelCount] #Find the percentage of each
pylab.pie(VowelPerc,None,VowelList,autopct='%1.1f%%')
pylab.show()It does what it needs to (I think) but I feel there is a better way of doing this (algorithmically speaking). I like Functional Programming so I threw in the Mapper but there is no other reason for that.
Solution
I strongly advise against
I also suggest better spacing around your operators, e.g.
Do not use
would become
Of course, that is far from optimal as we make five passes through the data. If we use a dictionary, we can reduce this to a single pass:
Note that I got rid of the unnecessary
As per the pie-plot documentation, you should consider setting the aspect ratio of your axes, so that the resulting chart doesn't look distorted.
CamelCase for your variable names – consider snake_case instead. Why? Because consistency with existing Python code.I also suggest better spacing around your operators, e.g.
x == vowel instead of x==Vowel and pylab.pie(vowel_perc, None, vowel_list, autopct='%1.1f%%') instead of pylab.pie(VowelPerc,None,VowelList,autopct='%1.1f%%'). In general, consistent, even spacing makes code easier to read.Do not use
map, or more precisely: do not use lambdas. Python has list comprehensions which are exactly as powerful as map and filter, but are considered to be more readable. Here, your lineVowelCount = [sum(map(lambda x: x==Vowel,VowelData)) for Vowel in VowelList]would become
count = [sum([x for x in vowel_data if x == vowel]) for vowel in vowel_list]Of course, that is far from optimal as we make five passes through the data. If we use a dictionary, we can reduce this to a single pass:
vowels = "aeiou"
counts = dict([(vowel, 0) for vowel in vowels])
for x in data:
counts[x] += 1
percentages = [counts[vowel] * 100.0 / len(data) for vowel in vowels]Note that I got rid of the unnecessary
vowel_ prefix here, and that I replaced sum(VowelCount) with the likely cheaper len(data) (but with all “optimizations”, the improvement should be benchmarked, not guessed).As per the pie-plot documentation, you should consider setting the aspect ratio of your axes, so that the resulting chart doesn't look distorted.
Code Snippets
VowelCount = [sum(map(lambda x: x==Vowel,VowelData)) for Vowel in VowelList]count = [sum([x for x in vowel_data if x == vowel]) for vowel in vowel_list]vowels = "aeiou"
counts = dict([(vowel, 0) for vowel in vowels])
for x in data:
counts[x] += 1
percentages = [counts[vowel] * 100.0 / len(data) for vowel in vowels]Context
StackExchange Code Review Q#43118, answer score: 6
Revisions (0)
No revisions yet.