patternpythonMinor
Implementing clearTable, grow and shrink on a hash table
Viewed 0 times
hashshrinkcleartablegrowandimplementingtable
Problem
I've gone through previous questions about implementing hash tables in python, but I have something little more specific. I am handling collisions with linear probing, and I want to grow/shrink my table when I go below/above a given loading factor. My put/get methods are heavily based from this source.
I would like to get some feedback on my grow/shrink methods. I'd also like to know if my method for clearing the table (clearTable) is approached correctly. My main question stems from the fact that
Please find my implementation below:
```
import copy
class HashTable:
def __init__(self, size = 11):
self.MAXLOADFACTOR = 0.75
self.MINLOADFACTOR = 0.25
self.MINSIZE = size
self.size = size
self.totalItems = 0
self.slots = [None] * self.size
self.data = [None] * self.size
def put(self, key, data):
hashvalue = self.hashfunction(key)
if self.slots[hashvalue] == None: # new data
self.slots[hashvalue] = key
self.data[hashvalue] = data
self.totalItems += 1
self.checkGrow()
else:
if self.slots[hashvalue] == key:
self.data[hashvalue] = data # replace
else: #collision has occured
nextslot = self.rehash(hashvalue, len(self.slots))
while self.slots[nextslot] != None and self.slots[nextslot] != key:
nextslot = self.rehash(nextslot,len(self.slots))
if self.slots[nextslot] == None: #new data
self.slots[nextslot] = key
self.data[nextslot] = data
self.totalItems += 1
self.checkGrow()
else:
self.data[nextslot] = data # replace
# assume key will always be int
I would like to get some feedback on my grow/shrink methods. I'd also like to know if my method for clearing the table (clearTable) is approached correctly. My main question stems from the fact that
del newHashTable might not be deleting the contents of the hash table, just like the references, so could that cause a memory problem? (Excuse my lack of vocabulary.)Please find my implementation below:
```
import copy
class HashTable:
def __init__(self, size = 11):
self.MAXLOADFACTOR = 0.75
self.MINLOADFACTOR = 0.25
self.MINSIZE = size
self.size = size
self.totalItems = 0
self.slots = [None] * self.size
self.data = [None] * self.size
def put(self, key, data):
hashvalue = self.hashfunction(key)
if self.slots[hashvalue] == None: # new data
self.slots[hashvalue] = key
self.data[hashvalue] = data
self.totalItems += 1
self.checkGrow()
else:
if self.slots[hashvalue] == key:
self.data[hashvalue] = data # replace
else: #collision has occured
nextslot = self.rehash(hashvalue, len(self.slots))
while self.slots[nextslot] != None and self.slots[nextslot] != key:
nextslot = self.rehash(nextslot,len(self.slots))
if self.slots[nextslot] == None: #new data
self.slots[nextslot] = key
self.data[nextslot] = data
self.totalItems += 1
self.checkGrow()
else:
self.data[nextslot] = data # replace
# assume key will always be int
Solution
Some very quick notes.
-
The algorithm in
This is a common mistake! In The Art of Computer Programming, Knuth comments, "Many computer programmers have great faith in algorithms, and they are surprised to find that the obvious way to delete records from a hash table doesn't work." (Vol. III, p. 533.)
If you want to get this right, Knuth gives an algorithm (pp. 533–4) for deletion in a open hash table with linear probing.
But a simpler approach is to replace the deleted key with a special key meaning "there was a key here but it was deleted" which you later remove when growing or shrinking the table.
Update: To create a unique key that you can be sure won't collide with any key passed by the user, just call
-
The use of
-
It looks as though I can't use
Update: You asked for use cases. Well, consider using a hash table to memoize a function that might take
It's quite straightforward to implement. Wherever the code has
-
The
-
The
-
The
-
-
There's lots of duplicated code between the
-
The algorithm in
remove is incorrect — it can leave keys unfindable.This is a common mistake! In The Art of Computer Programming, Knuth comments, "Many computer programmers have great faith in algorithms, and they are surprised to find that the obvious way to delete records from a hash table doesn't work." (Vol. III, p. 533.)
If you want to get this right, Knuth gives an algorithm (pp. 533–4) for deletion in a open hash table with linear probing.
But a simpler approach is to replace the deleted key with a special key meaning "there was a key here but it was deleted" which you later remove when growing or shrinking the table.
Update: To create a unique key that you can be sure won't collide with any key passed by the user, just call
object:_DELETED = object() # slot had a key but it was deleted-
The use of
copy.deepcopy in the grow and shrink methods is wrong — this will make deep copies of the keys and values in the hash table, which isn't what you want (the caller will expect to be able to retrieve the actual item they stored in the table, not some copy of it) and might invalidate the hashes of the keys. You need a shallow copy instead.-
It looks as though I can't use
None as a key. This seems unsatisfactory.Update: You asked for use cases. Well, consider using a hash table to memoize a function that might take
None as an argument (like functools.lru_cache). Or using a hash table to count the number of occurrences of each element of an iterable (like collections.Counter). Also it's nice to be able to document that any hashable object can be used as a key, without having to mention any special cases.It's quite straightforward to implement. Wherever the code has
None to mean an empty slot, use some other unique object instead:_EMPTY = object() # slot is empty-
The
rehash method is always called with len(self.slots) as the second argument, so there's no need for that argument — the method can just as well compute the size.-
The
get and put methods are redundant: you could just use __getitem__ and __setitem__.-
The
grow and shrink methods are almost identical. This common code could be shared.-
del newHashTable is unnecessary — when the method returns, the last reference to newHashTable disappears and it will be collected in any case.-
There's lots of duplicated code between the
__init__ and clearTable methods — why not have __init__ call clearTable?Code Snippets
_DELETED = object() # slot had a key but it was deleted_EMPTY = object() # slot is emptyContext
StackExchange Code Review Q#130063, answer score: 3
Revisions (0)
No revisions yet.