HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Simple word counter

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
wordsimplecounter

Problem

I've decided to do this by writing a simple word counter. The app gets all the params and outputs all the unique words, each one with a counter:

"Hello world Hello" would return "Hello: 2", "world: 1"

(not taking in consideration the actual output structure)

This program is the Python equivalent of:

import sys
from collections import defaultdict

def main():
    results = defaultdict(int)
    for word in sys.argv[1:]:
        results[word] += 1
    print results


Writing it in C is a bit different. I feel like I'm getting something utterly wrong with pointers, arrays of pointers and all that stuff.

#include 
#include 

// This is what a key-value pair: 
typedef struct {
    int counter;
    unsigned char* word;
} hashmap;

// Checks if inside the array of results, hashmap->word is equals to word paramter
hashmap* get_word_from_results(hashmap* results[], int count, const char* word) {
    int i;
    hashmap* result;
    for (i = 0; i word == (unsigned char *)word)
            return result;
    }
    return NULL;
}

int main(int argc, const char *argv[])
{
    hashmap* results;
    int results_counter = 0;

    int i;
    const char* word;
    for (i = 1; i counter++;
            printf("INCREMENTED\n");
        }
    }   
    return 0;
}


  • Can anyone give me some advice?



  • What am I doing wrong here?



  • Are my pointers okay? I also think I've spotted a memory leak (see comments).



  • Would anyone like to submit their version?

Solution

Well, firstly, your hashmap isn't a hash map. If that's what you want, it'd be better to go and implement one first. Remember, just because C doesn't have direct support for OO, doesn't mean you can't write real containers, or use real abstractions.

Try implementing the hash map in isolation first, then you can test the implementation, and then write your app on top.

Alternatively, if you want to focus on the app logic first,and then drill down into the implementation, either find an existing hashmap you can use, or switch to C++ and just use std::unordered_map

OK, and a review of the actual code:

  • you're switching between char and unsigned char for no reason. Stick with char * for text strings



-
you're using realloc wrongly; you need to pass the current value of results as the first argument if you want to grow the existing array (and have the existing values copied over)

  • eg. results = realloc(NULL, n) just discards (and leaks) the old array and allocates a new, bigger one



-
but results = realloc(results, n) moves the existing contents to a new, bigger block

-
unless the re-allocation fails, in which case you've leaked the old block. This may be an unrecoverable error anyway, but Loki Astari's comment shows the correct approach:

hashmap *tmp = realloc(results, n);
if (tmp)
  results = tmp; // reallocation succeeded
else {
  // handle failure somehow?
}


-
the hashmap results[] argument to get_word_from_results is the wrong type. You're passing hashmap, and that's what it should take. The fact you're using it as an array doesn't mean you have to throw in random []

  • you can't compare string values using if (result->word == word), that just checks whether they have the same address. It's exactly equivalent to if id(result.word) == id(word) in Python. Use strcmp to compare the contents of the string (or, go with C++ and use std::string references, which you can compare with ==)



For example, the public interface in hashmap.h might look like:

#ifndef HASHMAP_H
#define HASHMAP_H
/* nobody needs to see the contents except the implementation */
struct hashmap;

struct hashmap* create_hashmap();

void *lookup_hashmap(struct hashmap*, const char *key);
/* return NULL if not found? */

int insert_hashmap(struct hashmap*, const char *key, void *value);
/* return zero on success, -1 on collision? */
#endif


and the implementation file hashmap.c

#include "hashmap.h"

struct bucket
{
    const char *key;
    void *value;
};

struct hashmap
{
    int used; /* to calculate load factor */
    int n_buckets;
    struct bucket *buckets;
}

struct hashmap* create_hashmap() { /* allocate and initialize */ }

void *lookup_hashmap(struct hashmap *map, const char *key)
{
    /* hash key, find bucket, return value */
}

int insert_hashmap(struct hashmap *map, const char *key, void *value)
{
    /* hash key, find bucket, return -1 if it's in use?
       otherwise store value and return 0
       factor out bucket lookup from insert_ and lookup_?
       ...
    */
}


For reference, here's a simple C++11 implementation, which is (hopefully) much closer to Python than you can get in C:

#include 
#include 
#include 

int main(int argc, char *argv[]) {
    // std::unordered_map is similar to dict
    std::unordered_map results;
    for (int i=1; i<argc; ++i) {
        std::string word = argv[i];
        results[word] += 1;
    }
    // the main complexity is that C++ library types don't have a
    // built-in way to print themselves
    std::cout << "results = {";
    for (auto i:results) {
        std::cout << i.first << ':' << i.second << ' ';
    }
    std::cout << "}\n";
}

Code Snippets

hashmap *tmp = realloc(results, n);
if (tmp)
  results = tmp; // reallocation succeeded
else {
  // handle failure somehow?
}
#ifndef HASHMAP_H
#define HASHMAP_H
/* nobody needs to see the contents except the implementation */
struct hashmap;

struct hashmap* create_hashmap();

void *lookup_hashmap(struct hashmap*, const char *key);
/* return NULL if not found? */

int insert_hashmap(struct hashmap*, const char *key, void *value);
/* return zero on success, -1 on collision? */
#endif
#include "hashmap.h"

struct bucket
{
    const char *key;
    void *value;
};

struct hashmap
{
    int used; /* to calculate load factor */
    int n_buckets;
    struct bucket *buckets;
}

struct hashmap* create_hashmap() { /* allocate and initialize */ }

void *lookup_hashmap(struct hashmap *map, const char *key)
{
    /* hash key, find bucket, return value */
}

int insert_hashmap(struct hashmap *map, const char *key, void *value)
{
    /* hash key, find bucket, return -1 if it's in use?
       otherwise store value and return 0
       factor out bucket lookup from insert_ and lookup_?
       ...
    */
}
#include <string>
#include <unordered_map>
#include <iostream>

int main(int argc, char *argv[]) {
    // std::unordered_map is similar to dict
    std::unordered_map<std::string, int> results;
    for (int i=1; i<argc; ++i) {
        std::string word = argv[i];
        results[word] += 1;
    }
    // the main complexity is that C++ library types don't have a
    // built-in way to print themselves
    std::cout << "results = {";
    for (auto i:results) {
        std::cout << i.first << ':' << i.second << ' ';
    }
    std::cout << "}\n";
}

Context

StackExchange Code Review Q#13621, answer score: 6

Revisions (0)

No revisions yet.