patterncMinor
Simple word counter
Viewed 0 times
wordsimplecounter
Problem
I've decided to do this by writing a simple word counter. The app gets all the params and outputs all the unique words, each one with a counter:
"Hello world Hello" would return "Hello: 2", "world: 1"
(not taking in consideration the actual output structure)
This program is the Python equivalent of:
Writing it in C is a bit different. I feel like I'm getting something utterly wrong with pointers, arrays of pointers and all that stuff.
"Hello world Hello" would return "Hello: 2", "world: 1"
(not taking in consideration the actual output structure)
This program is the Python equivalent of:
import sys
from collections import defaultdict
def main():
results = defaultdict(int)
for word in sys.argv[1:]:
results[word] += 1
print resultsWriting it in C is a bit different. I feel like I'm getting something utterly wrong with pointers, arrays of pointers and all that stuff.
#include
#include
// This is what a key-value pair:
typedef struct {
int counter;
unsigned char* word;
} hashmap;
// Checks if inside the array of results, hashmap->word is equals to word paramter
hashmap* get_word_from_results(hashmap* results[], int count, const char* word) {
int i;
hashmap* result;
for (i = 0; i word == (unsigned char *)word)
return result;
}
return NULL;
}
int main(int argc, const char *argv[])
{
hashmap* results;
int results_counter = 0;
int i;
const char* word;
for (i = 1; i counter++;
printf("INCREMENTED\n");
}
}
return 0;
}- Can anyone give me some advice?
- What am I doing wrong here?
- Are my pointers okay? I also think I've spotted a memory leak (see comments).
- Would anyone like to submit their version?
Solution
Well, firstly, your hashmap isn't a hash map. If that's what you want, it'd be better to go and implement one first. Remember, just because C doesn't have direct support for OO, doesn't mean you can't write real containers, or use real abstractions.
Try implementing the hash map in isolation first, then you can test the implementation, and then write your app on top.
Alternatively, if you want to focus on the app logic first,and then drill down into the implementation, either find an existing hashmap you can use, or switch to C++ and just use
OK, and a review of the actual code:
-
you're using
-
but
-
unless the re-allocation fails, in which case you've leaked the old block. This may be an unrecoverable error anyway, but Loki Astari's comment shows the correct approach:
-
the
For example, the public interface in
and the implementation file
For reference, here's a simple C++11 implementation, which is (hopefully) much closer to Python than you can get in C:
Try implementing the hash map in isolation first, then you can test the implementation, and then write your app on top.
Alternatively, if you want to focus on the app logic first,and then drill down into the implementation, either find an existing hashmap you can use, or switch to C++ and just use
std::unordered_mapOK, and a review of the actual code:
- you're switching between
charandunsigned charfor no reason. Stick withchar *for text strings
-
you're using
realloc wrongly; you need to pass the current value of results as the first argument if you want to grow the existing array (and have the existing values copied over)- eg.
results = realloc(NULL, n)just discards (and leaks) the old array and allocates a new, bigger one
-
but
results = realloc(results, n) moves the existing contents to a new, bigger block-
unless the re-allocation fails, in which case you've leaked the old block. This may be an unrecoverable error anyway, but Loki Astari's comment shows the correct approach:
hashmap *tmp = realloc(results, n);
if (tmp)
results = tmp; // reallocation succeeded
else {
// handle failure somehow?
}-
the
hashmap results[] argument to get_word_from_results is the wrong type. You're passing hashmap, and that's what it should take. The fact you're using it as an array doesn't mean you have to throw in random []- you can't compare string values using
if (result->word == word), that just checks whether they have the same address. It's exactly equivalent toif id(result.word) == id(word)in Python. Usestrcmpto compare the contents of the string (or, go with C++ and use std::string references, which you can compare with==)
For example, the public interface in
hashmap.h might look like:#ifndef HASHMAP_H
#define HASHMAP_H
/* nobody needs to see the contents except the implementation */
struct hashmap;
struct hashmap* create_hashmap();
void *lookup_hashmap(struct hashmap*, const char *key);
/* return NULL if not found? */
int insert_hashmap(struct hashmap*, const char *key, void *value);
/* return zero on success, -1 on collision? */
#endifand the implementation file
hashmap.c#include "hashmap.h"
struct bucket
{
const char *key;
void *value;
};
struct hashmap
{
int used; /* to calculate load factor */
int n_buckets;
struct bucket *buckets;
}
struct hashmap* create_hashmap() { /* allocate and initialize */ }
void *lookup_hashmap(struct hashmap *map, const char *key)
{
/* hash key, find bucket, return value */
}
int insert_hashmap(struct hashmap *map, const char *key, void *value)
{
/* hash key, find bucket, return -1 if it's in use?
otherwise store value and return 0
factor out bucket lookup from insert_ and lookup_?
...
*/
}For reference, here's a simple C++11 implementation, which is (hopefully) much closer to Python than you can get in C:
#include
#include
#include
int main(int argc, char *argv[]) {
// std::unordered_map is similar to dict
std::unordered_map results;
for (int i=1; i<argc; ++i) {
std::string word = argv[i];
results[word] += 1;
}
// the main complexity is that C++ library types don't have a
// built-in way to print themselves
std::cout << "results = {";
for (auto i:results) {
std::cout << i.first << ':' << i.second << ' ';
}
std::cout << "}\n";
}Code Snippets
hashmap *tmp = realloc(results, n);
if (tmp)
results = tmp; // reallocation succeeded
else {
// handle failure somehow?
}#ifndef HASHMAP_H
#define HASHMAP_H
/* nobody needs to see the contents except the implementation */
struct hashmap;
struct hashmap* create_hashmap();
void *lookup_hashmap(struct hashmap*, const char *key);
/* return NULL if not found? */
int insert_hashmap(struct hashmap*, const char *key, void *value);
/* return zero on success, -1 on collision? */
#endif#include "hashmap.h"
struct bucket
{
const char *key;
void *value;
};
struct hashmap
{
int used; /* to calculate load factor */
int n_buckets;
struct bucket *buckets;
}
struct hashmap* create_hashmap() { /* allocate and initialize */ }
void *lookup_hashmap(struct hashmap *map, const char *key)
{
/* hash key, find bucket, return value */
}
int insert_hashmap(struct hashmap *map, const char *key, void *value)
{
/* hash key, find bucket, return -1 if it's in use?
otherwise store value and return 0
factor out bucket lookup from insert_ and lookup_?
...
*/
}#include <string>
#include <unordered_map>
#include <iostream>
int main(int argc, char *argv[]) {
// std::unordered_map is similar to dict
std::unordered_map<std::string, int> results;
for (int i=1; i<argc; ++i) {
std::string word = argv[i];
results[word] += 1;
}
// the main complexity is that C++ library types don't have a
// built-in way to print themselves
std::cout << "results = {";
for (auto i:results) {
std::cout << i.first << ':' << i.second << ' ';
}
std::cout << "}\n";
}Context
StackExchange Code Review Q#13621, answer score: 6
Revisions (0)
No revisions yet.