snippetcMinor
K&R (C): generate histogram of word-lengths
Viewed 0 times
generatewordhistogramlengths
Problem
From "The C Programming Language" (K&R):
Exercise 1-13. Write a program to print a histogram of the lengths of words in its input. It is easy to draw the histogram with the bars horizontal; a vertical orientation is more challenging.
I wrote a program to generate a vertically-oriented histogram. It doesn't accurately handle punctuation (punctuation characters get counted as word characters), but given the limited tools covered in chapter 1 of the book, I felt this was reasonable. I've tried to limit myself to tools covered up until this point in the book (I had to resist the urge to use ternaries in a couple spots).
A small sample run against a quote by Benjamin Franklin:
In wine there is wisdom, in beer there is Fre
Exercise 1-13. Write a program to print a histogram of the lengths of words in its input. It is easy to draw the histogram with the bars horizontal; a vertical orientation is more challenging.
I wrote a program to generate a vertically-oriented histogram. It doesn't accurately handle punctuation (punctuation characters get counted as word characters), but given the limited tools covered in chapter 1 of the book, I felt this was reasonable. I've tried to limit myself to tools covered up until this point in the book (I had to resist the urge to use ternaries in a couple spots).
#include
#define IN_WORD 1 /* point in word */
#define OUT_WORD 0 /* point out of word */
#define MAX_LEN 20 /* max word length to consider */
int main()
{
int i, j; /* iterators */
int c; /* character */
int len; /* word length */
int state; /* in/out of word */
int top_count;
int histogram[MAX_LEN];
i = j = c = len = top_count = 0;
state = OUT_WORD;
for (i = 0; i top_count)
top_count = histogram[j-1];
}
/* reset state */
len = 0;
state = OUT_WORD;
} else {
/* keep track of current word length */
++len;
state = IN_WORD;
}
}
/* print histogram from top to bottom */
for (i = top_count; i > 0; --i) {
/* print Y-axis */
printf("%3d|", i);
/* print entries for each word length */
for (j = 0; j < MAX_LEN; ++j) {
if (histogram[j] < i)
printf(" ");
else
printf(" .");
}
putchar('\n');
}
printf(" ");
/* print X-axis */
for (i = 0; i < MAX_LEN; ++i)
printf("--");
putchar('\n');
return 0;
}A small sample run against a quote by Benjamin Franklin:
In wine there is wisdom, in beer there is Fre
Solution
Initial reaction to your program is that it seems to be good and clean C code. So the overall impression is good, but there are things to review, even though some comments may be a bit of personal reference.
Refactored code
That is enough rambling, and here is my suggestion to refactor code addressing most of the issues I've commented upon:
``
#include
#define IN_WORD 1 / point in word /
#define OUT_WORD 0 / point out of word /
#define MAX_LEN 20 / max word length to consider /
/*
* Reset the histogram to all 0's too clear it out
*/
void init_histogram(int histogram[], size_t histogram_length)
{
int i;
for (i = 0; i 0 when we hit a non-character
*/
void create_histogram_from_input(int histogram[], size_t histogram_length)
{
int len=0; / length of current word /
int c; / current character /
/ Keep reading character until end of file /
while ((c = getchar()) != EOF) {
// Check if it is an character ...
if ( ( c >= 'a' && c = 'A' && c 0) {
/ increment histogram entry for current word length /
if (len >= histogram_length) {
len = histogram_length;
}
histogram[len - 1]++;
len = 0; / Reset length of current word /
}
}
}
}
/* Pretty print the histogram vertically, with numbered x-
* and y-axis. Only print as many columns as we have word lengths
*/
void print_histogram(int histogram[], size_t histogram_length)
{
int i, j, c;
const int y_axis_width = 4;
int top_count = 0; / Highest count of a given length /
int max_length = 0; / The longest word in histogram /
/ Find the highest count, and longest word /
for (i = histogram_length; i > 0; --i) {
/* If max_length is not set, and we find a positive length
* whilst going backwards, it is the longest word
*/
if (max_length == 0 && histogram[i-1] > 0) {
max_length = i;
}
/ Scan entire histogram to find the highest count of that length /
if (histogram[i-1] > top_count) {
top_count = histogram[i-1];
}
}
/ print histogram from top to bottom /
for (i = top_count; i > 0; i--) {
/ print Y-axis /
printf("%3d|", i); / If this changes, change y_axis_width /
/ print entries for each word length /
for (j = 0; j < max_length; j++) {
if (histogram[j] < i) {
- Punctuation char or not – The code includes check against single characters, so the extension to check against the ranges from
c > 'a' && c
- Split main method in multiple functions – This might not be covered yet, but I include it as it is good to start doing this as early as possible. However this does introduce the icky subject of passing arrays with/without size. Sorry about that!
- Use of state vs simple length of current word – The naming of the states was a little unclear to me, and whilst reviewing the logic it hit me that if you are not counting the length of a word (i.e. the character is a legal word character), you can simply test if the length of the word, len
, is larger than0to match your currentIN_WORDstate. As such, the state variable can be replaced with a testlen > 0
- Make the most common (shortest?) block come first – In your code you start of the input processing by verifying length of something and adding to the histogram. But of what? Then comes the block actually increasing the length and reading the words. To me, it is more natural to switch these blocks, so that you know why len
has a length
- Stay consistent with bracing – For most of the code you've chosen to have start braces at preceding line, and that is a style choice. You've also chosen to let one-liners be without braces, this I would advise against. At some point in time this cause you some issue. My strong suggestion is to always enclose for
orif(or similar) blocks with braces always. It will save you some grief further down the line.
- Include number on both axes – Kind of strange not have numbers on the x-axis. To allow for two digits in the length of word, you can increase width of each column to three characters
- Skip empty columns – No need to print the empty columns, is there?
- Consider changing columnn marker – To me, the .
is a little invisible, and I feel the#or*is more intuitive to use to mark the count
- Consider labeling the axes – Consider adding labels like Count or Occurences, and Word length for the axes. It does however impose a smaller problem of where to put them...
- Why the putchar, when printf works? – The usage of putchar('\n');
seems unmotivated, when you can do aprintf("\n");. Is there a reason behind that?
- Comment on comments – You've added suitable comments in most places, and when going to functions this becomes even more importantprocessing
Refactored code
That is enough rambling, and here is my suggestion to refactor code addressing most of the issues I've commented upon:
``
#include
#define IN_WORD 1 / point in word /
#define OUT_WORD 0 / point out of word /
#define MAX_LEN 20 / max word length to consider /
/*
* Reset the histogram to all 0's too clear it out
*/
void init_histogram(int histogram[], size_t histogram_length)
{
int i;
for (i = 0; i 0 when we hit a non-character
*/
void create_histogram_from_input(int histogram[], size_t histogram_length)
{
int len=0; / length of current word /
int c; / current character /
/ Keep reading character until end of file /
while ((c = getchar()) != EOF) {
// Check if it is an character ...
if ( ( c >= 'a' && c = 'A' && c 0) {
/ increment histogram entry for current word length /
if (len >= histogram_length) {
len = histogram_length;
}
histogram[len - 1]++;
len = 0; / Reset length of current word /
}
}
}
}
/* Pretty print the histogram vertically, with numbered x-
* and y-axis. Only print as many columns as we have word lengths
*/
void print_histogram(int histogram[], size_t histogram_length)
{
int i, j, c;
const int y_axis_width = 4;
int top_count = 0; / Highest count of a given length /
int max_length = 0; / The longest word in histogram /
/ Find the highest count, and longest word /
for (i = histogram_length; i > 0; --i) {
/* If max_length is not set, and we find a positive length
* whilst going backwards, it is the longest word
*/
if (max_length == 0 && histogram[i-1] > 0) {
max_length = i;
}
/ Scan entire histogram to find the highest count of that length /
if (histogram[i-1] > top_count) {
top_count = histogram[i-1];
}
}
/ print histogram from top to bottom /
for (i = top_count; i > 0; i--) {
/ print Y-axis /
printf("%3d|", i); / If this changes, change y_axis_width /
/ print entries for each word length /
for (j = 0; j < max_length; j++) {
if (histogram[j] < i) {
Code Snippets
#include <stdio.h>
#define IN_WORD 1 /* point in word */
#define OUT_WORD 0 /* point out of word */
#define MAX_LEN 20 /* max word length to consider */
/*
* Reset the histogram to all 0's too clear it out
*/
void init_histogram(int histogram[], size_t histogram_length)
{
int i;
for (i = 0; i < histogram_length; ++i) {
histogram[i] = 0;
}
}
/*
* Read input, character by character, and divide into
* words which are counted the length of (up until the max
* max length specified in histogram_length). If word is
* actually longer, the histogram records is at max length.
*
* Count characters, a-z and A-Z, and increase length of word,
* and add to histogram if length>0 when we hit a non-character
*/
void create_histogram_from_input(int histogram[], size_t histogram_length)
{
int len=0; /* length of current word */
int c; /* current character */
/* Keep reading character until end of file */
while ((c = getchar()) != EOF) {
// Check if it is an character ...
if ( ( c >= 'a' && c <= 'z') ||
( c >= 'A' && c <= 'Z') ||
c == '-' ) {
++len; /* Increase length of current word */
} else {
/* ... it was not a character, so check if we have
* a positive word length
*/
if (len > 0) {
/* increment histogram entry for current word length */
if (len >= histogram_length) {
len = histogram_length;
}
histogram[len - 1]++;
len = 0; /* Reset length of current word */
}
}
}
}
/* Pretty print the histogram vertically, with numbered x-
* and y-axis. Only print as many columns as we have word lengths
*/
void print_histogram(int histogram[], size_t histogram_length)
{
int i, j, c;
const int y_axis_width = 4;
int top_count = 0; /* Highest count of a given length */
int max_length = 0; /* The longest word in histogram */
/* Find the highest count, and longest word */
for (i = histogram_length; i > 0; --i) {
/* If max_length is not set, and we find a positive length
* whilst going backwards, it is the longest word
*/
if (max_length == 0 && histogram[i-1] > 0) {
max_length = i;
}
/* Scan entire histogram to find the highest count of that length */
if (histogram[i-1] > top_count) {
top_count = histogram[i-1];
}
}
/* print histogram from top to bottom */
for (i = top_count; i > 0; i--) {
/* print Y-axis */
printf("%3d|", i); /* If this changes, change y_axis_width */
/* print entries for each word length */
for (j = 0; j < max_length; j++) {
if (histogram[j] < i) {
printf(" ");
} else {
printf(" # ");
}
}
printf("\n");
Context
StackExchange Code Review Q#106542, answer score: 4
Revisions (0)
No revisions yet.