HiveBrain v1.2.0
Get Started
← Back to all entries
principleMinor

What is the advantage of leaf trees vs. node trees?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thewhatnodetreesleafadvantage

Problem

In the book "Advanced Data Structures", Chapter 2 ("Search Trees"), the author, Peter Braß, mentions two versions of binary trees (emphases in the quoted text are mine):


"...two different models of search trees, either of which can be
combined with much of the following material, but one of which is
strongly preferable.


If we compare in each node the query key with the key contained in the
node and follow the left branch if the query key is smaller and the
right branch if the query key is larger, then what happens if they are
equal? The two models of search trees are as follows:



-
Take left branch if query key is smaller than node key; otherwise take the right branch, until you reach a leaf of the tree. The keys in
the interior node of the tree are only for comparison; all the objects
are in the leaves.

-
Take left branch if query key is smaller than node key; take the right branch if the query key is larger than the node key; and take
the object contained in the node if they are equal."


The author calls these Model 1 and Model 2 respectively. A figure from the book gives an example of each:

The author talks some more about the difference between these types of trees, including

  • The maximum number of keys each type can store as a function of


height h ($2^h$ vs. $2^{h+1}-1$ respectively in model 1 and 2 respectively)

  • The number of comparisons on the key while traversing an interior node (1 vs 2 respectively)



  • Each key appearing up to two times in a tree (once in an interior node and once as a leaf node) in model 1 vs the key appearing exactly once in model 2.



He then says:


Model 2 became the preferred textbook version because in most
textbooks the distinction between object and its key is not made: the
key is the object. Then it becomes unnatural to duplicate the key in
the tree structure. But in all real applications, the distinction
between key and object is quite important. One almost never wishes

Solution

You don't quote the reasoning of P. Braß, if he gives any, so we can only guess.

I would like to distinguish two cases.

  • Data are stored as values, i.e. in place.



  • Data are stored by reference.



In the first case, the size of the "inner" tree would blow up arbitrarily. For every step in the search, we would have to load a potentially big node into the cache. That would increase the real cost of the search considerably (and arbitrarily, depending on the data type)! In such a scenario, using type 1 is indeed preferable as the search data structure itself contains only the information needed for performing the search.

In the second case -- which is the default in reference-based languages like e.g. Java -- this effect is present but minor: instead of two pointers, every node stores three. In such a scenario, I would prefer type 2 because it requires less storage in total.

Implementation-related side note: If you assign data (references) to inner nodes, they probably have some generic type Node (in modern languages). That means that the compiler may not be able optimize the search code as much as it could if the exact type were statically known. The specifics depend on the language, type system and compiler, though.

Context

StackExchange Computer Science Q#37157, answer score: 3

Revisions (0)

No revisions yet.