debugcppMinor
Implementation of fixed size hash map
Viewed 0 times
mapsizehashfixedimplementation
Problem
I'm trying to brush up on my algorithms and data structures and wrote a fixed sized hash map assuming both key and value will be only integers. This is a significant simplification of the real case and is supposed to be so. Any comments regarding bugs, best practices, style is gladly appreciated.
```
#include
#include
#include
#include
#include
#define SIZE 10
class HashEntry {
public:
int _k;
int _v;
HashEntry(int k, int v) : _k(k), _v(v) {};
};
class HashMap {
private:
std::vector buckets[SIZE];
int hashFunc(int key);
std::vector& getBucket(int key);
public:
bool keyExists(int k);
HashEntry get(int k);
bool put(int k, int v);
bool remove(int k);
};
int HashMap::hashFunc(int key) {
return key % SIZE;
}
std::vector& HashMap::getBucket(int key) {
if (key = 0");
}
return buckets[hashFunc(key)];
}
HashEntry HashMap::get(int k) {
std::vector& bucket = getBucket(k);
for (HashEntry entry : bucket) {
if (entry._k == k) {
return entry;
}
}
throw std::runtime_error("Key not found");
}
bool HashMap::keyExists(int k) {
std::vector& bucket = getBucket(k);
for (HashEntry entry : bucket) {
if (entry._k == k) {
// key already exists
return true;
}
}
return false;
}
bool HashMap::put(int k, int v) {
std::vector& bucket = getBucket(k);
if (keyExists(k)) return false;
bucket.push_back(HashEntry(k, v));
return true;
}
bool HashMap::remove(int k) {
std::vector& bucket = getBucket(k);
if (!(keyExists(k))) return false;
for (auto itr = bucket.begin(); itr != bucket.end(); ++itr) {
HashEntry entry = static_cast(*itr);
if (entry._k == k) {
bucket.erase(itr);
return true;
break;
}
}
return false;
}
int main() {
HashMap map;
for (int i=0; i < 10; ++i) {
assert(map.put(i, i*10));
}
for (int i=0; i < 10; ++
```
#include
#include
#include
#include
#include
#define SIZE 10
class HashEntry {
public:
int _k;
int _v;
HashEntry(int k, int v) : _k(k), _v(v) {};
};
class HashMap {
private:
std::vector buckets[SIZE];
int hashFunc(int key);
std::vector& getBucket(int key);
public:
bool keyExists(int k);
HashEntry get(int k);
bool put(int k, int v);
bool remove(int k);
};
int HashMap::hashFunc(int key) {
return key % SIZE;
}
std::vector& HashMap::getBucket(int key) {
if (key = 0");
}
return buckets[hashFunc(key)];
}
HashEntry HashMap::get(int k) {
std::vector& bucket = getBucket(k);
for (HashEntry entry : bucket) {
if (entry._k == k) {
return entry;
}
}
throw std::runtime_error("Key not found");
}
bool HashMap::keyExists(int k) {
std::vector& bucket = getBucket(k);
for (HashEntry entry : bucket) {
if (entry._k == k) {
// key already exists
return true;
}
}
return false;
}
bool HashMap::put(int k, int v) {
std::vector& bucket = getBucket(k);
if (keyExists(k)) return false;
bucket.push_back(HashEntry(k, v));
return true;
}
bool HashMap::remove(int k) {
std::vector& bucket = getBucket(k);
if (!(keyExists(k))) return false;
for (auto itr = bucket.begin(); itr != bucket.end(); ++itr) {
HashEntry entry = static_cast(*itr);
if (entry._k == k) {
bucket.erase(itr);
return true;
break;
}
}
return false;
}
int main() {
HashMap map;
for (int i=0; i < 10; ++i) {
assert(map.put(i, i*10));
}
for (int i=0; i < 10; ++
Solution
I would nest HashEntry inside of HashMap as just Entry since it's not very likely to ever stand on its own.
The
I would call them
It probably doesn't matter in practice since HashEntry's are fairly cheap, but this creates a copy of each bucket entry. If HashEntry weren't guaranteed to be such a trivial type, this could be devastating to performance. It should be
Any method that doesn't permute that map should be
Why does main return 1? It seems the flow that reaches main is a successful execution, which means that the return value should be 0 (or
You do a lot of the same "loop over and find the correct bucket slot" logic. That should be pulled into a function. That can be done by either making a
When possible, constants in the form of macros should typically be avoided in favor of const constants.
Macros do not have a type, and their names are typically lost as debugging symbols. A const variable has either of those drawbacks, and it also doesn't sacrifice anything.
Since
Try to avoid doing the same work twice. For example, consider your implementation of
While I'm being a bit performance paranoid, it would also be better to call
Is this cast necessary? I can't imagine that it is. Also,
I also wouldn't bother deferencing the iterator:
A
In the spirit of more performance paranoia, it would be nice to allow a single operation for both checking the existence of a key and retrieving the value of it if it does exist. This allows you to avoid the same loop over the bucket twice.
Something like
If you had the concept of an iterator, you could also do this in the form of having a
The
_ prefix is a bit dangerous as the standard reserves the underscore prefix in a few situations (though not this situation). Still though, it's a bit awkward for it to be there not only in case of bad habits, but also because it serves no purpose. And, map.get(3)._k reads a little oddly.I would call them
key and value. It's just a few more characters to type, and it makes the code read a lot clearer (map.get(3).key).hashFunc is a bit of an odd name. I would just call it hash instead since it's clearly a function.std::invalid_argument fits a bit better in getBucket than runtime_error does. The more the type of an exception can tell you, the better (in general -- don't take it to an extreme).for (HashEntry entry : bucket) {It probably doesn't matter in practice since HashEntry's are fairly cheap, but this creates a copy of each bucket entry. If HashEntry weren't guaranteed to be such a trivial type, this could be devastating to performance. It should be
for (const HashEntry& entry : bucket) {.Any method that doesn't permute that map should be
const (keyExists, get, getBucket). Const correctness is an important part of C++, and it should always be kept in mind when designing/implementing classes.Why does main return 1? It seems the flow that reaches main is a successful execution, which means that the return value should be 0 (or
EXIT_SUCCESS).You do a lot of the same "loop over and find the correct bucket slot" logic. That should be pulled into a function. That can be done by either making a
Bucket type that has the concept searching built in, or, more simply, just write a function that loops over a vector and returns an iterator to the HashEntry (or end()).When possible, constants in the form of macros should typically be avoided in favor of const constants.
const int SIZE = 10 -- or, since it's being used an array size/index, you might find const std::size_t SIZE = 10; more fitting (I would just stick with an int).Macros do not have a type, and their names are typically lost as debugging symbols. A const variable has either of those drawbacks, and it also doesn't sacrifice anything.
Since
getBucket is a very straightforward constant time operation, it doesn't really matter, but containers activate my super picky mode with regards to performance :).Try to avoid doing the same work twice. For example, consider your implementation of
put. It calls getBucket and it calls keyExists, which behind the scenes calls getBucket. This implies that it might be good to have an additional method behind the scenes and leverage that for checking if keys exists:bool HashMap::keyExists(const std::vector& bucket, int k) const {
for (const HashEntry& entry : bucket) {
if (entry.key == k) {
return true;
}
}
return false;
}
bool HashMap::keyExists(int k) const {
return keyExists(getBucket(k), k);
}
bool HashMap::put(int k, int v) {
std::vector& bucket = getBucket(k);
if (keyExists(bucket, k)) return false;
bucket.push_back(HashEntry(k, v));
return true;
}While I'm being a bit performance paranoid, it would also be better to call
getBucket after keyExists in your current implementation. That way, for the case when keys don't exist, you don't bother doing the work of getting the bucket.HashEntry entry = static_cast(*itr);Is this cast necessary? I can't imagine that it is. Also,
HashEntry entry = static_cast(itr) should be const HashEntry& entry = ... (and really it should be const HashEntry& entry = itr; or const auto& entry = *itr;).I also wouldn't bother deferencing the iterator:
if (itr->key == k) {
bucket.erase(itr);
return true;
}A
break; immediately following a return is just noise. Since the function has already been exited, the loop will have been too.In the spirit of more performance paranoia, it would be nice to allow a single operation for both checking the existence of a key and retrieving the value of it if it does exist. This allows you to avoid the same loop over the bucket twice.
Something like
bool find(int key, int& value); or std::pair find(int key) would do the trick (provided it was implemented to take advantage of the ability to do only a single pass).If you had the concept of an iterator, you could also do this in the form of having a
find() that returns an iterator either to the element or to the end(). It's actually a bit icky to do this without iterators since you don't really have anything meaningful to return if you want to avoid a reference parameter. You would have to have either a std::pair will a null value, or you'd have to have a junk value in the int component of a std::pair. With an int, the junk value approach works pretty well since you can just throw a 0 in it, butCode Snippets
for (HashEntry entry : bucket) {bool HashMap::keyExists(const std::vector<HashEntry>& bucket, int k) const {
for (const HashEntry& entry : bucket) {
if (entry.key == k) {
return true;
}
}
return false;
}
bool HashMap::keyExists(int k) const {
return keyExists(getBucket(k), k);
}
bool HashMap::put(int k, int v) {
std::vector<HashEntry>& bucket = getBucket(k);
if (keyExists(bucket, k)) return false;
bucket.push_back(HashEntry(k, v));
return true;
}HashEntry entry = static_cast<HashEntry>(*itr);if (itr->key == k) {
bucket.erase(itr);
return true;
}Context
StackExchange Code Review Q#67625, answer score: 6
Revisions (0)
No revisions yet.