patterncMinor
Unordered (hash) map for C
Viewed 0 times
unorderedmapforhash
Problem
I am still working on basic data structures for C. Now I came up with a hash map:
unordered_map.h:
```
#ifndef UNORDERED_MAP_H
#define UNORDERED_MAP_H
#include
#include
#ifdef __cplusplus
extern "C" {
#endif
typedef struct unordered_map_t unordered_map_t;
typedef struct unordered_map_iterator_t unordered_map_iterator_t;
/*
Allocates a new, empty map with given comparator function.
*/
unordered_map_t* unordered_map_t_alloc
(size_t initial_capacity,
float load_factor,
size_t (p_hash_function)(void),
bool (p_equals_function)(void, void*));
/*
If p_map contains the key p_key, associates it with value p_value and
returns the old value of that key.
*/
void unordered_map_t_put (unordered_map_t p_map,
void* p_key,
void* p_value);
/*
Returns a positive value if p_key is mapped to some value in this map.
*/
bool unordered_map_t_contains_key (unordered_map_t* p_map,
void* p_key);
/*
Returns the value associated with the p_key, or NULL if p_key is not
mapped in the map.
****
unordered_map.h:
```
#ifndef UNORDERED_MAP_H
#define UNORDERED_MAP_H
#include
#include
#ifdef __cplusplus
extern "C" {
#endif
typedef struct unordered_map_t unordered_map_t;
typedef struct unordered_map_iterator_t unordered_map_iterator_t;
/*
Allocates a new, empty map with given comparator function.
*/
unordered_map_t* unordered_map_t_alloc
(size_t initial_capacity,
float load_factor,
size_t (p_hash_function)(void),
bool (p_equals_function)(void, void*));
/*
If p_map contains the key p_key, associates it with value p_value and
returns the old value of that key.
*/
void unordered_map_t_put (unordered_map_t p_map,
void* p_key,
void* p_value);
/*
Returns a positive value if p_key is mapped to some value in this map.
*/
bool unordered_map_t_contains_key (unordered_map_t* p_map,
void* p_key);
/*
Returns the value associated with the p_key, or NULL if p_key is not
mapped in the map.
****
Solution
Pointer used after free
In your
Notice that after the entry is freed, you take the bucket head and move it forward by one element. But the head of that bucket is the entry you just freed. So when you access
Actually I wouldn't even bother advancing the bucket heads one by one. Since you are clearing the entire hash table, I would probably just
Double call to hash function
In your
Out of memory check
In your
Simplify list removal
In your
You can actually handle each of the head and tail cases separately like this:
Use common find function
Right now, you have four nearly identical copies of code to find an existing entry in
Note that the four uses aren't all the same.
Use size_t
You use
Out of date comment
The comment for the
const
A lot of your functions could take
Floating point
The Floating Point Police™ would like to point out that you could avoid using floating point. Instead of a floating point load factor, you could use a fixed point value. For example, the load factor could be a value between 1..256 and represent a fraction out of 256.
Why bother, you ask? When you use floating point:
Of course, you pro
In your
clear() function, there is this code:while (p_entry)
{
index = p_map->p_hash_function(p_entry->p_key) & p_map->mask;
p_next_entry = p_entry->p_next;
free(p_entry);
p_entry = p_next_entry;
if (p_map->p_table[index])
{
p_map->p_table[index] = p_map->p_table[index]->p_chain_next;
}
}Notice that after the entry is freed, you take the bucket head and move it forward by one element. But the head of that bucket is the entry you just freed. So when you access
p_chain_next, you are reading into the entry you just freed.Actually I wouldn't even bother advancing the bucket heads one by one. Since you are clearing the entire hash table, I would probably just
memset the entire table to zero. That way you don't have to call the hash function on each entry.Double call to hash function
In your
put() function, you end up calling the hash function twice if the element doesn't already exist, because you double check in case the table may have increased in size since the first call. Since the hash function may be slow, you should store the result of the hash in a local variable to avoid calling the hash function again. An alternative solution would be to add the entry first and then resize the table after.Out of memory check
In your
alloc() function, you're missing a NULL check for when you allocate p_table. That allocation may well fail because the user can pass in an arbitrarily large initial capacity.Simplify list removal
In your
remove() function, you use an if-else block of four cases to remove a node from a doubly linked list:/* Unlink from the global iteration chain. */
if (p_current_entry->p_prev && p_current_entry->p_next)
{
/* Once here, the current entry has both next and previous. */
p_current_entry->p_prev->p_next = p_current_entry->p_next;
p_current_entry->p_next->p_prev = p_current_entry->p_prev;
}
else if (!p_current_entry->p_prev && !p_current_entry->p_next)
{
/* Once here, the current entry
is the only entry in the chain. */
p_map->p_head = NULL;
p_map->p_tail = NULL;
}
else if (p_current_entry->p_next)
{
/* Once here, the current entry is the head of the chain. */
p_map->p_head = p_current_entry->p_next;
p_map->p_head->p_prev = NULL;
}
else
{
/* Once here, the current entry is the tail of the chain. */
p_map->p_tail = p_current_entry->p_prev;
p_map->p_tail->p_next = NULL;
}You can actually handle each of the head and tail cases separately like this:
/* Unlink from the global iteration chain. */
if (p_current_entry->p_prev) {
p_current_entry->p_prev->p_next = p_current_entry->p_next;
} else {
p_map->p_head = p_current_entry->p_next;
}
if (p_current_entry->p_next) {
p_current_entry->p_next->p_prev = p_current_entry->p_prev;
} else {
p_map->p_tail = p_current_entry->p_prev;
}Use common find function
Right now, you have four nearly identical copies of code to find an existing entry in
get(), contains_key(), put(), and remove(). I would recommend writing an internal find_entry() function that all four functions can use.Note that the four uses aren't all the same.
remove() also needs to know the previous entry pointer, and put() also needs to know the hash value if the entry didn't exist. So you'll have to figure out how you want to write your find_entry() function to make that happen.Use size_t
You use
size_t almost everywhere, which is good. But you missed two spots. The return values of your size() and has_next() functions should return size_t instead of int.Out of date comment
The comment for the
is_healthy function says something about an AVL-tree. It must have been copy and pasted from your previous project.const
A lot of your functions could take
const arguments since they do not modify the argument. For example:int unordered_map_t_size(const unordered_map_t* p_map);Floating point
The Floating Point Police™ would like to point out that you could avoid using floating point. Instead of a floating point load factor, you could use a fixed point value. For example, the load factor could be a value between 1..256 and represent a fraction out of 256.
Why bother, you ask? When you use floating point:
- The OS needs to start saving/restoring floating point state for your process every time you context switch.
- If the OS decides to lazily save/restore your floating point state, you will take an exception the first time you use floating point after a context switch. This exception tells the OS that it needs to swap in the floating point state now.
Of course, you pro
Code Snippets
while (p_entry)
{
index = p_map->p_hash_function(p_entry->p_key) & p_map->mask;
p_next_entry = p_entry->p_next;
free(p_entry);
p_entry = p_next_entry;
if (p_map->p_table[index])
{
p_map->p_table[index] = p_map->p_table[index]->p_chain_next;
}
}/* Unlink from the global iteration chain. */
if (p_current_entry->p_prev && p_current_entry->p_next)
{
/* Once here, the current entry has both next and previous. */
p_current_entry->p_prev->p_next = p_current_entry->p_next;
p_current_entry->p_next->p_prev = p_current_entry->p_prev;
}
else if (!p_current_entry->p_prev && !p_current_entry->p_next)
{
/* Once here, the current entry
is the only entry in the chain. */
p_map->p_head = NULL;
p_map->p_tail = NULL;
}
else if (p_current_entry->p_next)
{
/* Once here, the current entry is the head of the chain. */
p_map->p_head = p_current_entry->p_next;
p_map->p_head->p_prev = NULL;
}
else
{
/* Once here, the current entry is the tail of the chain. */
p_map->p_tail = p_current_entry->p_prev;
p_map->p_tail->p_next = NULL;
}/* Unlink from the global iteration chain. */
if (p_current_entry->p_prev) {
p_current_entry->p_prev->p_next = p_current_entry->p_next;
} else {
p_map->p_head = p_current_entry->p_next;
}
if (p_current_entry->p_next) {
p_current_entry->p_next->p_prev = p_current_entry->p_prev;
} else {
p_map->p_tail = p_current_entry->p_prev;
}int unordered_map_t_size(const unordered_map_t* p_map);Context
StackExchange Code Review Q#102575, answer score: 3
Revisions (0)
No revisions yet.