HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Compile-time data structure generator

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
compiletimegeneratorstructuredata

Problem

In response to another recent question I mentioned that one mechanism to avoid runtime overhead for creating a data structure was to create it at compile time and use it directly. Since there was some interest in that concept, I thought I would write code to show how that might be done and submit it for review here.

The theory

There are two pieces to this code.

The first is code that is used to construct an AVL tree of arbitrary data. For illustration purposes, the data inserted into the tree are the names of the months (in English) and the number of days in each month. The month name is used as the key for lookup and the number of days is the associated data retrieved. This first piece of code constructs the tree and then emits it as a C source code representing the resulting structure.

The second piece of code is the part that would be compiled and linked with the object file resulting from the first piece.

makeavl.c

```
#include
#include
#include

typedef struct node {
char *key;
int data;

struct node *left;
struct node *right;
} node;

node new_node(char key, int data)
{
node p = malloc(sizeof(p));
if (p != NULL) {
p->key = key;
p->data = data;
p->left = NULL;
p->right = NULL;
}
return p;
}

int max(int a, int b)
{
return a > b ? a : b;
}

int nodecount(node *p)
{
int count = 0;
if (p != NULL) {
++count;
count += nodecount(p->left);
count += nodecount(p->right);
}
return count;
}

int relheight(node *p, int count)
{
if (p == NULL)
return count;
return max(relheight(p->left, count+1), relheight(p->right, count+1));
}

int height(node * p)
{
return relheight(p, 0);
}

node rotate_right(node p)
{
node *q = p->left;

p->left = q->right;
q->right = p;

return q;
}

node rotate_left(node p)
{
node *q = p->right;
p->right = q->left;
q->left = p;

return q;
}

node balance(node p)
{

Solution

It's a little tricky to provide feedback on this code, while at the same time remembering that it's just an example. (As nhgrif mentioned in the comments, "What if it's a leap year?") In real life, we'd never want to use a binary search tree; at least we'd prefer to store the data in a sorted array and use binary search — i.e., we'd strength-reduce "dereferencing operations on pointers" to "arithmetic operations on indices", because the latter are much cheaper on our usual hardware. In fact, for just 12 pieces of data, we'd be more likely to use linear search, or perhaps a perfect hash table.

However, let's say we had a whole ton of data to store in this data structure, and for some reason we really needed it to be stored in a binary search tree with real pointers. (I can't think of such an application, but I'm willing to stipulate that one exists.)

Then your program organization, IMHO, still leaves something to be desired.

int main(void)
{
#include "staticavl.c"

    node *n = search(caltree, "February");
    printf("There are %d days in %s.\n", n->data, n->key);
}


I would greatly prefer this to be written as

#include "month_data.h"

int main(void)
{
    month_data *n = get_month_data("February");
    printf("There are %d days in %s.\n", n->days, n->name);
}


where "month_data.c" would consist of the same pieces of code you wrote, just rearranged to put the module boundary in the right place:

// month_data.h
typedef struct month_data {
    const char *name;
    int days;
} month_data;

month_data *get_month_data(const char *name);

// month_data.c
struct node {
    struct month_data data;    
    struct node *left;
    struct node *right;
};

static struct node *search(struct node *p, const char *key)
{
    if (p == NULL)
        return NULL;

    int keycmp = strcmp(key, p->data.name);
    if (keycmp left, key);
    else if (keycmp > 0)
        return search(p->right, key);
    else
        return p;
}

static struct node caltree[] = {
/* 0 */ { "March", 31, &caltree[1], &caltree[8] },
/* 1 */ { "January", 31, &caltree[2], &caltree[6] },
/* 2 */ { "August", 31, &caltree[3], &caltree[4] },
/* 3 */ { "April", 30, NULL, NULL },
/* 4 */ { "February", 28, &caltree[5], NULL },
/* 5 */ { "December", 31, NULL, NULL },
/* 6 */ { "June", 30, &caltree[7], NULL },
/* 7 */ { "July", 31, NULL, NULL },
/* 8 */ { "October", 31, &caltree[9], &caltree[11] },
/* 9 */ { "May", 31, NULL, &caltree[10] },
/* 10 */ { "November", 30, NULL, NULL },
/* 11 */ { "September", 30, NULL, NULL },
};

month_data *get_month_data(const char *name)
{
     node *n = search(caltree, name);
     return n ? &n->data : NULL;
}


The advantage of this organization is that we've put the module boundary at a natural place: the month_data module is now concerned solely with getting the data for a particular month. The notion of "searching a tree" is encapsulated into it as an implementation detail, one that the user of the module doesn't have to worry about. Nor does the compiler have to worry about it, because we're no longer passing a struct node * across the module boundary; the compiler is free to inline and optimize our use of the now-static data structure.

And we're free to change the internal implementation of get_month_data; if profiling proves that a linear search in a lookup table would be faster, then we can make that change, without even requiring the client to recompile their main.o module. Using your old code, they'd have to not only recompile useavl.o, but rewrite it, unless you went out of your way to reuse the same struct node interface. (And a good engineer would also want to rename useavl.o, since it would then no longer be using an AVL tree. That's a lot of needless code-shoveling, which we could have avoided by using the right abstractions from the beginning.)

Code Snippets

int main(void)
{
#include "staticavl.c"

    node *n = search(caltree, "February");
    printf("There are %d days in %s.\n", n->data, n->key);
}
#include "month_data.h"

int main(void)
{
    month_data *n = get_month_data("February");
    printf("There are %d days in %s.\n", n->days, n->name);
}
// month_data.h
typedef struct month_data {
    const char *name;
    int days;
} month_data;

month_data *get_month_data(const char *name);

// month_data.c
struct node {
    struct month_data data;    
    struct node *left;
    struct node *right;
};

static struct node *search(struct node *p, const char *key)
{
    if (p == NULL)
        return NULL;

    int keycmp = strcmp(key, p->data.name);
    if (keycmp < 0)
        return search(p->left, key);
    else if (keycmp > 0)
        return search(p->right, key);
    else
        return p;
}

static struct node caltree[] = {
/* 0 */ { "March", 31, &caltree[1], &caltree[8] },
/* 1 */ { "January", 31, &caltree[2], &caltree[6] },
/* 2 */ { "August", 31, &caltree[3], &caltree[4] },
/* 3 */ { "April", 30, NULL, NULL },
/* 4 */ { "February", 28, &caltree[5], NULL },
/* 5 */ { "December", 31, NULL, NULL },
/* 6 */ { "June", 30, &caltree[7], NULL },
/* 7 */ { "July", 31, NULL, NULL },
/* 8 */ { "October", 31, &caltree[9], &caltree[11] },
/* 9 */ { "May", 31, NULL, &caltree[10] },
/* 10 */ { "November", 30, NULL, NULL },
/* 11 */ { "September", 30, NULL, NULL },
};

month_data *get_month_data(const char *name)
{
     node *n = search(caltree, name);
     return n ? &n->data : NULL;
}

Context

StackExchange Code Review Q#98167, answer score: 6

Revisions (0)

No revisions yet.