patternpythonMinor
Histogram of a string
Viewed 0 times
histogramstringstackoverflow
Problem
I'm teaching myself Python and when a friend posted this sentence
Only the fool would take trouble to verify that his sentence was
composed of ten a's, three b's, four c's, four d's, forty-six e's,
sixteen f's, four g's, thirteen h's, fifteen i's, two k's, nine l's,
four m's, twenty-five n's, twenty-four o's, five p's, sixteen r's,
forty-one s's, thirty-seven t's, ten u's, eight v's, eight w's, four
x's, eleven y's, twenty-seven commas, twenty-three apostrophes, seven
hyphens and, last but not least, a single !
I thought, as a fool, I would try to verify it by plotting a histogram. This is my code:
Which generates
Please let me know how to improve my code. Also, please let me know what I did wrong with
Only the fool would take trouble to verify that his sentence was
composed of ten a's, three b's, four c's, four d's, forty-six e's,
sixteen f's, four g's, thirteen h's, fifteen i's, two k's, nine l's,
four m's, twenty-five n's, twenty-four o's, five p's, sixteen r's,
forty-one s's, thirty-seven t's, ten u's, eight v's, eight w's, four
x's, eleven y's, twenty-seven commas, twenty-three apostrophes, seven
hyphens and, last but not least, a single !
I thought, as a fool, I would try to verify it by plotting a histogram. This is my code:
import matplotlib.pyplot as plt
import numpy as np
sentence = "Only the fool would take trouble to verify that his sentence was composed of ten a's, three b's, four c's, four d's, forty-six e's, sixteen f's, four g's, thirteen h's, fifteen i's, two k's, nine l's, four m's, twenty-five n's, twenty-four o's, five p's, sixteen r's, forty-one s's, thirty-seven t's, ten u's, eight v's, eight w's, four x's, eleven y's, twenty-seven commas, twenty-three apostrophes, seven hyphens and, last but not least, a single !".lower()
# Convert the string to an array of integers
numbers = np.array([ord(c) for c in sentence])
u = np.unique(numbers)
# Make the integers range from 0 to n so there are no gaps in the histogram
# [0][0] was a hack to make sure `np.where` returned an int instead of an array.
ind = [np.where(u==n)[0][0] for n in numbers]
bins = range(0,len(u)+1)
hist, bins = np.histogram(ind, bins)
plt.bar(bins[:-1], hist, align='center')
plt.xticks(np.unique(ind), [str(unichr(n)) for n in set(numbers)])
plt.grid()
plt.show()Which generates
Please let me know how to improve my code. Also, please let me know what I did wrong with
plt.xticks that resulted in the gaps at the beginning and the end (or is that just a case of incorrect axis limits?).Solution
Your code is pretty good! I have only one substantive and a few stylistic suggestion.
Style
Substance
Putting all these ideas together:
Other
It wasn't anything you did with
Style
- Since
sentenceis a hard-coded variable, Python convention is that it should be in all-uppercase, i.e.SENTENCEis a better variable name.
- What are
uandnin your code? It's hard to figure out what those variables mean. Could you be more descriptive with your naming?
- Your call to
.lower()onsentenceis hidden after the very long sentence. For readability I wouldn't hide any function calls at the end of very long strings.
- Python has multi-line string support using the
"""delimiters. Using it makes the sentence and the code more readable, although at the expense of introducing newline\ncharacters that would show up on the histogram if they are not removed. In my code below I use the"""delimiter and remove the\ncharacters I introduced to break the string into screen-width-sized chunks. PEP8 convention is that code lines shouldn't be more than about 80 characters long.
- You should consider breaking this code up into two functions, one to make generate the data, and one to make the graph, but we can leave that for another time.
Substance
- Since your sentence is a Python string (not a NumPy character array), you can generate the data for your histogram quite easily by using the
Counterdata type that is available in thecollectionsmodule. It's designed for exactly applications like this. Doing so will let you avoid the complications of bin edges vs. bin centers that stem from usingnp.histogramentirely.
Putting all these ideas together:
import matplotlib.pyplot as plt
import numpy as np
from collections import Counter
SENTENCE = """Only the fool would take trouble to verify that his sentence was composed of ten a's, three b's, four c's,
four d's, forty-six e's, sixteen f's, four g's, thirteen h's, fifteen i's, two k's, nine l's, four m's, twenty-five n's,
twenty-four o's, five p's, sixteen r's, forty-one s's, thirty-seven t's, ten u's, eight v's, eight w's, four x's,
eleven y's, twenty-seven commas, twenty-three apostrophes, seven hyphens and, last but not least, a single !"""
# generate histogram
letters_hist = Counter(SENTENCE.lower().replace('\n', ''))
counts = letters_hist.values()
letters = letters_hist.keys()
# graph data
bar_x_locations = np.arange(len(counts))
plt.bar(bar_x_locations, counts, align = 'center')
plt.xticks(bar_x_locations, letters)
plt.grid()
plt.show()Other
It wasn't anything you did with
plt.xticks that led to the gaps. That's the matplotlib default. If you want a "tight" border to the graph, try adding a plt.xlim(-0.5, len(counts) - 0.5) before the plt.show().Code Snippets
import matplotlib.pyplot as plt
import numpy as np
from collections import Counter
SENTENCE = """Only the fool would take trouble to verify that his sentence was composed of ten a's, three b's, four c's,
four d's, forty-six e's, sixteen f's, four g's, thirteen h's, fifteen i's, two k's, nine l's, four m's, twenty-five n's,
twenty-four o's, five p's, sixteen r's, forty-one s's, thirty-seven t's, ten u's, eight v's, eight w's, four x's,
eleven y's, twenty-seven commas, twenty-three apostrophes, seven hyphens and, last but not least, a single !"""
# generate histogram
letters_hist = Counter(SENTENCE.lower().replace('\n', ''))
counts = letters_hist.values()
letters = letters_hist.keys()
# graph data
bar_x_locations = np.arange(len(counts))
plt.bar(bar_x_locations, counts, align = 'center')
plt.xticks(bar_x_locations, letters)
plt.grid()
plt.show()Context
StackExchange Code Review Q#129412, answer score: 5
Revisions (0)
No revisions yet.