HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Cache aware, Key/value dictionary based on 4-bit(16-slot) Trie (Prefix Tree), with deferred allocation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
bitslotwithvaluecachebasedtrieprefixdictionaryallocation

Problem

Deferred allocation is a memory optimizing trick of using the Node** as Node* when there is only one item in the "bucket". Thus "item" member holds either the index of such lonely item, or a negative value if the bucket is fully allocated.

Cache-awareness here is merely adding a compiler's __builtin_prefetch() in the right place to allow about 30% speed up.

trie4d.h

#ifndef TRIE4D_H

#include 
#include 
#include 

/* define your own value type if needed */
#define VALUE_TYPE int

struct Node {
    struct Node **N;
    VALUE_TYPE value;
    char item;
};

struct Node* newNode();
int addNode(struct Node *C, void *key, int count, VALUE_TYPE value);
struct Node* findNode(struct Node * C, void *key, int count);
void freeNode(struct Node* C);

#endif // TRIE4D_H


trie4d.c

```
#include "trie4d.h"

struct Node* newNode() {
struct Node *node;
node = malloc(sizeof (struct Node));
node->value = -1;
node->item = -1;
return node;
}

int addNode(struct Node C, void key, int count, VALUE_TYPE value) {
int pos = 0;
struct Node* tmp;
while (1) {
char u;
int bpos = pos >> 1;
if (bpos >= count) {
if (C->value == -1) {
C->value = value;
return 1; / value added /
}
C->value = value;
return 0; / value replaced /
}
unsigned char b = ((unsigned char*)key)[bpos];
if (pos++ & 1) u = (b >>= 4);
else u = b & 0xf;
if (C->item == -1) {
C->item = u;
C->N = (struct Node**)newNode();
C = (struct Node*) C->N;
}
else if (C->item >= 0) {
if (C->item == u) {
C = (struct Node*)C->N;
}
else {
tmp = (struct Node*) C->N;
C->N = malloc(sizeof (void) 16);
memset(C->N, 0, sizeof(void*) N[C->item] = tmp;
C->item = -2;
C = C->N[u]

Solution

General comments

Overall I found your code to be quite reasonable. You are using each 4 bits of a variable length key as a "character" of a "trie".

-
I think you should have defined these constants instead of using magic numbers:

#define ITEM_EMPTY       -1
#define ITEM_ALLOCATED   -2
#define VALUE_UNSET      -1


  • When you allocate a new node, you set node->item to -1, but you do not initialize node->N to anything. This works because there is an implicit assumption in your code that if node->item == -1, then node->N is unusable. But just for your own sanity when debugging, I suggest that you set node->N = NULL anyways.



-
In these two lines of code:

C->N = malloc(sizeof (void*) * 16);
           memset(C->N, 0, sizeof(void*) << 4);


First of all, you are inconsistent in that you use * 16 one place and

-
Your if/else style and braces are inconsistent. Compare:

if (C->item == u) {
            C = (struct Node*)C->N;
        }
        else {


with:

if (C->item == -1) return 0;
        else if (C->item >= 0) {
            if (u == C->item) C = (struct Node*)C->N;
            else return 0;
        } else {


-
Your variable names are a bit cryptic. I eventually figured out that
bpos was "byte position" and pos was "nibble position". I still don't know what C or u stand for, though. I think count should be renamed keyLength.

Loading each byte twice

In both your
addNode() and findNode() functions, you load a byte from the key and either use its upper or lower half (4 bits per trie depth). So you end up loading the byte from the key twice (once for each half). You might be able to do better by saving the value of that byte and on each 2nd iteration, you can use the saved value to get the upper half instead of reloading that byte from memory.

Prefetch

As written, I don't think your
__builtin_prefetch() call will do anything. You put that prefetch call right before you are about to dereference the pointer, so even if the compiler inserted a prefetch instruction there, it wouldn't help speed anything up. I compiled your program with and without that line and both versions generated the same assembly code (with no prefetch instruction). The way to get benefit out of builtin_prefetch()` is to call it some amount of time before you are going to actually use that address. For example, at the top of the while loop, you can do:

if (C->item != -1)
    __builtin_prefetch(C->N);


I'm not sure that you will get much of a benefit from this though. You'll have to run some tests to see whether this actually helps or hurts.

Code Snippets

#define ITEM_EMPTY       -1
#define ITEM_ALLOCATED   -2
#define VALUE_UNSET      -1
C->N = malloc(sizeof (void*) * 16);
           memset(C->N, 0, sizeof(void*) << 4);
C->N = calloc(16, sizeof(void *));
if (C->item == u) {
            C = (struct Node*)C->N;
        }
        else {
if (C->item == -1) return 0;
        else if (C->item >= 0) {
            if (u == C->item) C = (struct Node*)C->N;
            else return 0;
        } else {

Context

StackExchange Code Review Q#97169, answer score: 2

Revisions (0)

No revisions yet.