HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Managing IDs using AVL trees

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
managingidstreesavlusing

Problem

In a txt file there are IDs and next to them there are their neighbours like this:

1  1
1  2
1  3
2  1
2  4
2  8
3  5


Using AVL trees I store the IDs in one AVL tree and for every ID in this AVL
I make another AVL tree in which I store its neighbours.In the end I make another txt file that in every line has an ID , its number of neighbours and then its neighbours.Using the above example the output txt is :

1 3 1 2 3
2 2 1 4 8
3 1 5


I managed to make it work but the problem is that when I have many IDs the program gets very slow because I suspect there are too many IDs in the first AVL
tree to search insert and balance everytime.Is there any shortcut to make it faster?
Some of the functions I used :

```
Node AVLTree::search(Node node, int ID)
{
if(node == NULL)
{
return NULL;
}
if(node->ID == ID)
{
return node;
}
else if(node->ID right, ID);
}
else
{
return search(node->left, ID);
}
}

void AVLTree::balance(Node *&node)
{
if (node == NULL)
{
return;
}
int height_diff = heightDifference(node);
if (height_diff > 1)
{
if( heightDifference(node->left) > 0)
{
node = left_rotation(node);
}
else
{
node = left_right_rotation(node);
}
}
else if (height_diff right) > 0)
{
node = right_left_rotation(node);
}
else
{
node = right_rotation(node);
}
}
balance(node->left);
balance(node->right);
}

Node AVLTree::insert(Node node, int ID)
{
if(node == NULL)
{
node = newNode(ID);
size++;
return node;
}
else if(node->ID > ID)
{
node->left = insert(node->left, ID);
balance(node);
}
else if(node->ID right = insert(node->right, ID);
balance(node);
}
return node;
}

int AVLTree::height(Node *node)
{
if (node != NULL)
{

Solution

Calling the number next to the ids "neighbours" is confusing, actually the ids are more like keys and the second parameter can be considered a value?

You have an AVL tree listing all the keys, and for each key, an AVL storing all the values.

You have 350000 ids with 50 values for each key. 50 is a small number, making it not completely necessary to use an AVL tree. A simple array or vector could work, it would remove a lot of the overhead of dynamically allocating nodes for each new value.

AVL trees are supposed to do every basic operation in \$O(log(n))\$, meaning insert, search, delete. Meaning, individual operations should still be very fast even after 350000 IDs are inserted (for reference, \$log_2(350000)\$ ~ 18).

The reason your program slows down should be the insert method in the AVL tree. It balances the tree after every insertion, starting from the root. Which in turn calculates the height difference and the height of both subtrees recursively:

int AVLTree::height(Node *node)
{
    if (node != NULL)
    {
        int leftHeight = height(node->left);
        int rightHeight = height(node->right);
        int maxHeight = max(leftHeight, rightHeight);
        return maxHeight + 1;
    }
    else
    {
        return 0;
    }
}


This function is actually \$O(n)\$ since it goes recursively through each child of node. Meaning, each time you get the height of a node when rebalancing the tree, it will be slow.

The solution is to keep track of the height of each node. It could look something like this for the insert function:

Node* AVLTree::insert(Node *node, int ID)
{
    if(!node) //if you want to use a comparison to null here, use nullptr instead of NULL
    {
        node = newNode(ID); //node->height should be 0 at init
        size++;
        return node;
    }

    if (node->ID == ID) {
        return node;
    }

    if(node->ID > ID)
    {
        node->left = insert(node->left, ID);
    }
    else if(node->ID right = insert(node->right, ID);
    }

    updateHeight(node);
    balance(node);

    return node;
}


Then the updateHeight function could be like this:

void AVLTree::updateHeight(Node *node) {
    if (!node) {
        return;
    }

    node->height = max(height(node->left), height(node->right)) + 1;
}


And the height function would be simplified and non-recursive:

int AVLTree::height(Node *node)
{
    if (!node) {
        //since the value of a leaf is 0, the value of a non existing-node is -1
        //if you want 0 here the value for a new node should be 1
        return -1; 
    }
    return node->height;
}


The node's height is only updated for nodes that are traversed during the insert function, so this operation only has a complexity of \$O(log(n))\$.

Of course you also have to update the height manually when rotating the nodes or deleting the nodes, by calling updateHeight() on the appropriate nodes. (Note that children have to have their heights updated before the parents).

There is also another area in the balance function that introduces complexity:

balance(node->left);
balance(node->right);


It will traverse the whole subtree and be applied to every node in the tree, causing a complexity of \$O(n)\$. In the insert function balance is already applied to all the ancestry of a node that gets a new child, so in the balance function you should only do recursive calls when necessary.

I'm not sure the calls to balance the children are necessary if the rotations are properly done (maybe they are) but at least they are not necessary when the current node is already balanced, so changing the beginning of the balance function to this will reduce its complexity to \$O(log(n))\$:

void AVLTree::balance(Node *&node)
{
    if (!node) 
    {
        return;
    }
    int height_diff = heightDifference(node);
    if (abs(height_diff) <= 1)
    {
        return;
    }

    //...


As it will stop going down the subtree if it is balanced.

No comments on the rest of your code, it's very readable and I wouldn't sacrifice that over small optimizations when they are not needed.

PS: If you keep this:

balance(node->left);
balance(node->right);


At the end of the balance function, don't forget to call updateHeight(node) as the height of the children have potentially changed. But those two calls seem weird to me as balancing children can change their height and potentially unbalance the current node again.

Code Snippets

int AVLTree::height(Node *node)
{
    if (node != NULL)
    {
        int leftHeight = height(node->left);
        int rightHeight = height(node->right);
        int maxHeight = max(leftHeight, rightHeight);
        return maxHeight + 1;
    }
    else
    {
        return 0;
    }
}
Node* AVLTree::insert(Node *node, int ID)
{
    if(!node) //if you want to use a comparison to null here, use nullptr instead of NULL
    {
        node = newNode(ID); //node->height should be 0 at init
        size++;
        return node;
    }

    if (node->ID == ID) {
        return node;
    }

    if(node->ID > ID)
    {
        node->left = insert(node->left, ID);
    }
    else if(node->ID < ID)
    {
        node->right = insert(node->right, ID);
    }

    updateHeight(node);
    balance(node);

    return node;
}
void AVLTree::updateHeight(Node *node) {
    if (!node) {
        return;
    }

    node->height = max(height(node->left), height(node->right)) + 1;
}
int AVLTree::height(Node *node)
{
    if (!node) {
        //since the value of a leaf is 0, the value of a non existing-node is -1
        //if you want 0 here the value for a new node should be 1
        return -1; 
    }
    return node->height;
}
balance(node->left);
balance(node->right);

Context

StackExchange Code Review Q#129567, answer score: 2

Revisions (0)

No revisions yet.