patterncMinor

Hash-consing: running slow

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview c stackoverflow optimization algorithm performance

runningslowhashconsing

Problem

I've isolated this function, cons, for hash-consing in C. It is where the bottleneck of my program is. How could its performance be improved?

var32 cons(var32 x, var32 y){
    var32 ptr;
    uid id, hashedId;
    map newMap, currentMap;
    u32 colls;

    #ifdef __DEBUG__
    ++consCount;
    colls = 0;
    #endif

    // Before anything, compute the unique id of that node.
    id = (((uid)(x.ptr))<<32) | (uid)y.ptr;

    // Try to find the node. If it already exists, return its ptr.
    hashedId = id % MAP_HASH_SIZE;
    currentMap = mapByHashedId[hashedId];
    if (currentMap) do {
        if (mapId[currentMap] == id)
            return mapPtr[currentMap];
        #ifdef __DEBUG__
        ++colls;
        #endif
    } while ((currentMap = mapNext[currentMap]));

    #ifdef __DEBUG__
    if (colls<16) ++collisionCount[colls];
    ++allocCount;
    #endif

    // If does not exist, then alloc a new ptr to it.
    ptr = nodeNewPtr[nodeNewPtrIndex--];

    // Add that new ptr to the map.
    newMap = mapNewPtr[mapNewPtrIndex--];
    if (mapByHashedId[hashedId]) 
        mapNext[newMap] = mapByHashedId[hashedId];
    mapId[newMap] = id;
    mapPtr[newMap] = ptr;
    mapByHashedId[hashedId] = newMap;

    // Fill that new ptr.
    nodeLeft[GETPTR(ptr)] = x;
    nodeRight[GETPTR(ptr)] = y;

    return ptr;
};

Here is a link for complete source with tests!

Solution

Your code here is hard to follow because your code style and format are inconvenient.

Structures like this:

currentMap = mapByHashedId[hashedId];
if (currentMap) do {
    if (mapId[currentMap] == id)
        return mapPtr[currentMap];
    #ifdef __DEBUG__
    ++colls;
    #endif
} while ((currentMap = mapNext[currentMap]));

are .... irregular. a better format would be:

currentMap = mapByHashedId[hashedId];
while (currentMap)
{ 
    if (mapId[currentMap] == id)
        return mapPtr[currentMap];

    #ifdef __DEBUG__
    ++colls;
    #endif

    currentMap = mapNext[currentMap]
}

Additionally, this is the only loop in your method. All the other code are simple array and pointer manipulations. This method can only be slow because of three reasons:

it is called many, many times

the while loop iterates many many times

there's something else you're not showing.

Assuming the number of calls are appropriate, and you're not missing anything, the only alternative is that the while loop loops often. This indicates there are many collisions. Collisions happen for two reasons:

your MAP_HASH_SIZE is too small

your hashing function is irregular

I am going to go with both of those. In fact, I am going to guess that MAP_HASH_SIZE is something like 8192, or some other power of 2. The reason this is my (educated) guess, is that your hash function is vulnerable to inconsistencies.

Consider how you build your function, using two 32-bit values with the x value shifted and or'd with the y value. Let's also use some test data:

x      y
---    ---
0x1    0x1
0x2    0x1
0x3    0x1
0x4    0x1

Now, in your function you would produce the ID by shifting the X, and the hash, by the modulo of the table size. Let's assume the table size is 32768 (a nice big value that should eliminate collisions, right)?

x      y          id   hashedId
---    ---  ----------   --------
0x1    0x1  0x10000001        0x1
0x2    0x1  0x20000001        0x1
0x3    0x1  0x30000001        0x1
0x4    0x1  0x40000001        0x1

Because your MAP_HASH_SIZE is likely a power-of-2, the % MAP_HASH_SIZE is essentially the same as a bitmask of the low-order bits (or 0x7fff for size 32768)

If all your y values are similar, you will end up with many collisions in just a few buckets in the hash table.

For the hash function I would consider guaranteeing a minimum size (say 16 bits, or 64K), and then having a function which at least shifts some of the x value in to the range, and does an Xor....

hashedId = ((x << 7) ^ (y * 31)) % MAP_HASH_SIZE;

The above function will give you better distribution (perhaps not perfect) even if the hash size is a power of 2 > about 16 bits.

Code Snippets

currentMap = mapByHashedId[hashedId];
if (currentMap) do {
    if (mapId[currentMap] == id)
        return mapPtr[currentMap];
    #ifdef __DEBUG__
    ++colls;
    #endif
} while ((currentMap = mapNext[currentMap]));

currentMap = mapByHashedId[hashedId];
while (currentMap)
{ 
    if (mapId[currentMap] == id)
        return mapPtr[currentMap];

    #ifdef __DEBUG__
    ++colls;
    #endif

    currentMap = mapNext[currentMap]
}

x      y
---    ---
0x1    0x1
0x2    0x1
0x3    0x1
0x4    0x1

x      y          id   hashedId
---    ---  ----------   --------
0x1    0x1  0x10000001        0x1
0x2    0x1  0x20000001        0x1
0x3    0x1  0x30000001        0x1
0x4    0x1  0x40000001        0x1

hashedId = ((x << 7) ^ (y * 31)) % MAP_HASH_SIZE;

Context

StackExchange Code Review Q#57002, answer score: 6

Revisions (0)

No revisions yet.