patternpythonMinor

Implementing clearTable, grow and shrink on a hash table

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

hashshrinkcleartablegrowandimplementingtable

Problem

I've gone through previous questions about implementing hash tables in python, but I have something little more specific. I am handling collisions with linear probing, and I want to grow/shrink my table when I go below/above a given loading factor. My put/get methods are heavily based from this source.

I would like to get some feedback on my grow/shrink methods. I'd also like to know if my method for clearing the table (clearTable) is approached correctly. My main question stems from the fact that del newHashTable might not be deleting the contents of the hash table, just like the references, so could that cause a memory problem? (Excuse my lack of vocabulary.)

Please find my implementation below:

```
import copy

class HashTable:
def __init__(self, size = 11):
self.MAXLOADFACTOR = 0.75
self.MINLOADFACTOR = 0.25
self.MINSIZE = size
self.size = size
self.totalItems = 0
self.slots = [None] * self.size
self.data = [None] * self.size

def put(self, key, data):
hashvalue = self.hashfunction(key)

if self.slots[hashvalue] == None: # new data
self.slots[hashvalue] = key
self.data[hashvalue] = data
self.totalItems += 1
self.checkGrow()
else:
if self.slots[hashvalue] == key:
self.data[hashvalue] = data # replace
else: #collision has occured
nextslot = self.rehash(hashvalue, len(self.slots))
while self.slots[nextslot] != None and self.slots[nextslot] != key:
nextslot = self.rehash(nextslot,len(self.slots))
if self.slots[nextslot] == None: #new data
self.slots[nextslot] = key
self.data[nextslot] = data
self.totalItems += 1
self.checkGrow()
else:
self.data[nextslot] = data # replace

# assume key will always be int

Solution

Some very quick notes.

-
The algorithm in remove is incorrect — it can leave keys unfindable.

This is a common mistake! In The Art of Computer Programming, Knuth comments, "Many computer programmers have great faith in algorithms, and they are surprised to find that the obvious way to delete records from a hash table doesn't work." (Vol. III, p. 533.)

If you want to get this right, Knuth gives an algorithm (pp. 533–4) for deletion in a open hash table with linear probing.

But a simpler approach is to replace the deleted key with a special key meaning "there was a key here but it was deleted" which you later remove when growing or shrinking the table.

Update: To create a unique key that you can be sure won't collide with any key passed by the user, just call object:

_DELETED = object()  # slot had a key but it was deleted

-
The use of copy.deepcopy in the grow and shrink methods is wrong — this will make deep copies of the keys and values in the hash table, which isn't what you want (the caller will expect to be able to retrieve the actual item they stored in the table, not some copy of it) and might invalidate the hashes of the keys. You need a shallow copy instead.

-
It looks as though I can't use None as a key. This seems unsatisfactory.

Update: You asked for use cases. Well, consider using a hash table to memoize a function that might take None as an argument (like functools.lru_cache). Or using a hash table to count the number of occurrences of each element of an iterable (like collections.Counter). Also it's nice to be able to document that any hashable object can be used as a key, without having to mention any special cases.

It's quite straightforward to implement. Wherever the code has None to mean an empty slot, use some other unique object instead:

_EMPTY = object()    # slot is empty

-
The rehash method is always called with len(self.slots) as the second argument, so there's no need for that argument — the method can just as well compute the size.

-
The get and put methods are redundant: you could just use __getitem__ and __setitem__.

-
The grow and shrink methods are almost identical. This common code could be shared.

-
del newHashTable is unnecessary — when the method returns, the last reference to newHashTable disappears and it will be collected in any case.

-
There's lots of duplicated code between the __init__ and clearTable methods — why not have __init__ call clearTable?

Code Snippets

_DELETED = object()  # slot had a key but it was deleted

_EMPTY = object()    # slot is empty

Context

StackExchange Code Review Q#130063, answer score: 3

Revisions (0)

No revisions yet.