patterncppMinor
Hash table based key/value dictionary in C++ with murmur hash function, linked list buckets in 200 LOC
Viewed 0 times
bucketswithmurmur200functionvaluehashlocbaseddictionary
Problem
hashdic.h -- suggested for direct include or copy paste to your project.
```
#include "string.h" // memcpy/memcmp
typedef __uint32_t u32;
typedef __int8_t i8;
typedef __uint8_t u8;
u32 MurMur(const char* key, int count) {
#define muR(x, r) (((x) > (32 - r)))
u32 seed = 0x8e96a8f2;
int n = count >> 2;
u32 k, mur = seed, c1 = 0xcc9e2d51, c2 = 0x1b873593,
b = (u32) (key + n * 4);
for(int i = -n; i != 0; i++) {
k = (muR(b[i] c1, 15) c2);
mur = muR(mur^k, 13)*5+0xe6546b64;
}
u8 tail = (u8)(key + (n> 16) * 0x85ebca6b >> 13;
return mur ^ (mur * 0xc2b2ae35 >> 16);
#undef muR
}
#define hash MurMur
int initial_size = 1024;
float growth_threshold = 2.0; //when to resize, for example 0.5 means "if number of inserted keys is half of table length then resize". My experiments on english dictionary shows balanced performance/memory savings with 1.0.
float growth_factor = 10; // grow the size of hash table by N, suggested number is between 2 (conserve memory) and 10 (faster insertions).
int dups = 0, resizes = 0, mem_count = 0, slots=0; // count some stats
typedef bool (enumFunc)(void key, int count, int value, void user);
template struct jshash {
struct keynode {
char *key;
short len;
keynode *next;
value_t value;
keynode(char*k, int l) {
len = l;
key = new char[l];
memcpy(key, k, l);
next = 0;
value = -1;
}
~keynode() {
delete[] key;
mem_count += sizeof(keynode) + len;
if (next) delete next;
}
bool cmp(char*k, int l) {
if (len != l) return false;
return memcmp(key, k, l) == 0;
}
};
struct entry {
keynode *k;
entry():k(0) {}
~entry() {
mem_count += sizeof(entry);
if (k) delete k;
}
};
entry **table;
int length, count;
j
```
#include "string.h" // memcpy/memcmp
typedef __uint32_t u32;
typedef __int8_t i8;
typedef __uint8_t u8;
u32 MurMur(const char* key, int count) {
#define muR(x, r) (((x) > (32 - r)))
u32 seed = 0x8e96a8f2;
int n = count >> 2;
u32 k, mur = seed, c1 = 0xcc9e2d51, c2 = 0x1b873593,
b = (u32) (key + n * 4);
for(int i = -n; i != 0; i++) {
k = (muR(b[i] c1, 15) c2);
mur = muR(mur^k, 13)*5+0xe6546b64;
}
u8 tail = (u8)(key + (n> 16) * 0x85ebca6b >> 13;
return mur ^ (mur * 0xc2b2ae35 >> 16);
#undef muR
}
#define hash MurMur
int initial_size = 1024;
float growth_threshold = 2.0; //when to resize, for example 0.5 means "if number of inserted keys is half of table length then resize". My experiments on english dictionary shows balanced performance/memory savings with 1.0.
float growth_factor = 10; // grow the size of hash table by N, suggested number is between 2 (conserve memory) and 10 (faster insertions).
int dups = 0, resizes = 0, mem_count = 0, slots=0; // count some stats
typedef bool (enumFunc)(void key, int count, int value, void user);
template struct jshash {
struct keynode {
char *key;
short len;
keynode *next;
value_t value;
keynode(char*k, int l) {
len = l;
key = new char[l];
memcpy(key, k, l);
next = 0;
value = -1;
}
~keynode() {
delete[] key;
mem_count += sizeof(keynode) + len;
if (next) delete next;
}
bool cmp(char*k, int l) {
if (len != l) return false;
return memcmp(key, k, l) == 0;
}
};
struct entry {
keynode *k;
entry():k(0) {}
~entry() {
mem_count += sizeof(entry);
if (k) delete k;
}
};
entry **table;
int length, count;
j
Solution
I don't know the exact details about how hashtables work, neither do i know the murmur hash function, so i can't say anything about your algorithm but i'd like to comment your general code a little bit.
I personally would use the given typedefs in stdint.h. If you're used to those typedefs, there is nothing against it. But: Other programmers who read your source might not be used to it. Also, if you're not using a full IDE but a texteditor with highlighting (i love sublime), the highlighter does not analyze your code, this means it will not highlight u8, i8, u32 correctly, but it will highlight uint8_t, int8_t and uint32_t as types.
Use a function for that. Either declare it as inline, or just don't, the optimizer should inline it anyways. There is no (good) reason to use defines where a inlined function produces exactly the same code.
This could go horribly wrong, and trust me, if it does, you won't find that bug. Either use UPPERCASE or don't use that define at all. Do you really need it?
If you pass an invalid char, this might leak. (I am actually not sure if it will ever return from memcpy) Use a
TLDR: Raw pointers are hard to handle and it isn't necessary. std::vector, std::array, std::shared_ptr/unique_ptr are doing a great job handling this stuff for you.
I am almost sure, those c-like function typedefs don't allow a functor to be passed. Use a
One last notice:
Every pointer is implicit convertible to a void in C++. It might warn you or error out, because a string-literal is a const char, let your method take a const void if it doesn't change the value. This helps to remove the unneeded casts to remove the const and allows better optimisation (the optimizer might see it itself that you don't change this value...). Also, C++ has its own casts. A
typedef __uint32_t u32;
typedef __int8_t i8;
typedef __uint8_t u8;I personally would use the given typedefs in stdint.h. If you're used to those typedefs, there is nothing against it. But: Other programmers who read your source might not be used to it. Also, if you're not using a full IDE but a texteditor with highlighting (i love sublime), the highlighter does not analyze your code, this means it will not highlight u8, i8, u32 correctly, but it will highlight uint8_t, int8_t and uint32_t as types.
#define muR(x, r) (((x) > (32 - r)))Use a function for that. Either declare it as inline, or just don't, the optimizer should inline it anyways. There is no (good) reason to use defines where a inlined function produces exactly the same code.
#define hash MurMurThis could go horribly wrong, and trust me, if it does, you won't find that bug. Either use UPPERCASE or don't use that define at all. Do you really need it?
keynode(char*k, int l) {
len = l;
key = new char[l];
memcpy(key, k, l);
next = 0;
value = -1;
}If you pass an invalid char, this might leak. (I am actually not sure if it will ever return from memcpy) Use a
vector instead of the char:vector key;
keynode(char*k, int l) {
len = l;
key.resize(l);
memcpy(&key[0], k, l);
next = 0;
value = -1;
}TLDR: Raw pointers are hard to handle and it isn't necessary. std::vector, std::array, std::shared_ptr/unique_ptr are doing a great job handling this stuff for you.
typedef bool (*enumFunc)(void *key, int count, int *value, void *user);I am almost sure, those c-like function typedefs don't allow a functor to be passed. Use a
std::function instead to allow something like this:H.forEach([&](void* key,int count,int* value, void* user){
//Do something. Since i pass all local variables as reference
// (with [&]) i can access every local variable from outside of this functor
}, (char*)"for each: ");One last notice:
H.add((void*)"hello", 5);Every pointer is implicit convertible to a void in C++. It might warn you or error out, because a string-literal is a const char, let your method take a const void if it doesn't change the value. This helps to remove the unneeded casts to remove the const and allows better optimisation (the optimizer might see it itself that you don't change this value...). Also, C++ has its own casts. A
(T)something to remove a const becomes a const_cast(something). Not only is this the recommended way to cast in C++, also it lets the reader see exactly what you're doing. (Are you casting a void to a T? reinterpret_cast Tells you, you're changing the pointer type. Or just removing the constness? const_cast tells you, you're only removing the constness but keep the same type)Code Snippets
typedef __uint32_t u32;
typedef __int8_t i8;
typedef __uint8_t u8;#define muR(x, r) (((x) << r) | ((x) >> (32 - r)))#define hash MurMurkeynode(char*k, int l) {
len = l;
key = new char[l];
memcpy(key, k, l);
next = 0;
value = -1;
}vector<char> key;
keynode(char*k, int l) {
len = l;
key.resize(l);
memcpy(&key[0], k, l);
next = 0;
value = -1;
}Context
StackExchange Code Review Q#97334, answer score: 6
Revisions (0)
No revisions yet.