patternpythonMinor
Word frequency generator in Python
Viewed 0 times
frequencywordpythongenerator
Problem
Please critique my word frequency generator. I am a beginner programmer so any criticism are welcome.
Original Code: http://pastebin.com/rSRfbnCt
Here is my revised code after the feedback:
Original Code: http://pastebin.com/rSRfbnCt
Here is my revised code after the feedback:
import string, time, webbrowser, collections, urllib2
def timefunc(function):
def wrapped(*args):
start = time.time()
data = function(*args)
end = time.time()
timetaken = end - start
print "Function: "+function.__name__+"\nTime taken:",timetaken
return data
return wrapped
@timefunc
def process_text(text_file):
words = text_file.read().lower().split()
words = [word.strip(string.punctuation+string.whitespace) for word in words]
words = [word for word in words if word]#skips ''(None) elements
return words
@timefunc
def create_freq_dict(wordlist):
freq_dict = collections.Counter(wordlist)
return freq_dict
@timefunc
def create_sorted_list(freqdict):
sorted_list = [(value,key) for key,value in list(freqdict.items())]#list() provides python 3 compatibility
sorted_list.sort(reverse=True)
return sorted_list
@timefunc
def write_results(sorted_list):
text_file = open('wordfreq.txt','w')
text_file.write('Word Frequency List\n\n')
rank = 0
for word in sorted_list:
rank += 1
write_str = "[{0}] {1:-10}\n".format(rank, word[1],word[0])
text_file.write(write_str)
text_file.close()
## The Brothers Grimm
## This file can be obtained from Project Gutenberg:
## http://www.gutenberg.org/cache/epub/5314/pg5314.txt
web_file = urllib2.urlopen('http://www.gutenberg.org/cache/epub/5314/pg5314.txt')
wordlist = process_text(web_file)
freqdict = create_freq_dict(wordlist)
sorted_list = create_sorted_list(freqdict)
results = write_results(sorted_list)
webbrowser.open('wordfreq.txt')
print "END"Solution
import string, time, math, webbrowser
def timefunc(function, *args):
start = time.time()
data = function(*args)
end = time.time()
timetaken = end - startI'd recommend calling this
time_taken as its slightly easier to read.print "Function: "+function.__name__+"\nTime taken:",timetakenPrint already introduces newlines and combines different pieces. Take advantage of that.
print "Function: ", function.__name__
print "Time Taken: ", time_takenThat's easier to follow
return data
def process_text(filename):You never use filename in here, but you do use fin which is the same thing. typo?
t = []Not a very descriptive name. I suggest coming up with something clearer
for line in fin:
for i in line.split():i usually means index which its not hereword = i.lower()
word = word.strip(string.punctuation)
if word != '':
t.append(word)
return tI'd do this as
words = fin.read().lower().split()
words = [word.strip() for word in words]
words = [word for word in words if word]
return wordsI think its easier to follow and probably more efficient
def create_freq_dict(wordlist):
d = dict()d is not a very good name. Usually dicts are created with {} not dict(). No difference, but the first is generally prefferedfor word in wordlist:
if word not in d.keys():
d[word] = 1
else:
d[word] += 1Use
d = collections.defaultdict(int) or d = collections.Counter(). Both will make it easier to count up like this. See the python documentation for collections. You should actually be able to write this function in one linereturn d
def sort_dict(in_dict):
t = []
for key,value in in_dict.items():
t.append((value, key))in_dict.items() is a list already, there is no reason to copy the elements into the list. (NOTE: in Python 3.x in_dict.items() is no longer a list). Even if it wasn't a list you could do:t = list(in_dict.items())Which would do the same thing your code does
t.sort(reverse=True)
return tI'd implement this function as
return sorted(in_dict.items())The sorted function takes anything sufficiently list-like and produces a sorted list from it.
def write_results(sorted_list):sorted_list isn't a great name. It would be better to give an indication of what's in the list.
fout = open('wordfreq.txt','w')
fout.write('Word Frequency List\n\n')
r = 0
for i in sorted_list:
r += 1Use
for r, (key, value) in enumerate(sortedlist): That way you don't need to manage r yourself, and you can refer the value as value rather then the harder to read i[1].fillamount = 20 - (len(i[1]) + len(str(r)))Some of the those parens are unnecessary.
write_str = str(r)+': '+i[1]+' '+('-' * (fillamount-2))+' '+str(i[0])+'\n'Multiplication has precedence, you don't need the parens to make that happen. Also, python has a method ljust which does this for you
write_str = (str(r) + ': ' + i[1]).ljust('-', 20) + str(i[0]) + '\n'You may also want to consider using string formatting rather then adding strings
write_str = ('%d: %s' % (r, i[1])).just('-', 20) + '%d\n' % i[0]I think its easier to follow, although I'd probably split across several lines
fout.write(write_str)
fout.close()
## The Brothers Grimm
## This file can be obtained from Project Gutenberg:
## http://www.gutenberg.org/cache/epub/5314/pg5314.txt
fin = open('c:\Python27\My Programs\wx\grimm.txt')fin presumable stands for file in. Give a name that indicates what's actually in it ike
grim_textwordlist = timefunc(process_text,fin)
freqdict = timefunc(create_freq_dict,wordlist)
sorted_list = timefunc(sort_dict,freqdict)
results = timefunc(write_results,sorted_list)Very nice
webbrowser.open('wordfreq.txt')That's a slightly unusual use of a webbrowser
print "END"Code Snippets
import string, time, math, webbrowser
def timefunc(function, *args):
start = time.time()
data = function(*args)
end = time.time()
timetaken = end - startprint "Function: "+function.__name__+"\nTime taken:",timetakenprint "Function: ", function.__name__
print "Time Taken: ", time_takenreturn data
def process_text(filename):for line in fin:
for i in line.split():Context
StackExchange Code Review Q#8648, answer score: 7
Revisions (0)
No revisions yet.