HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Term for binary search tree using hashes?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
searchtermbinaryforusinghashestree

Problem

I was looking for a way to easily store and access a symbol table using the least memory and code as possible and I went with a BST. Symbols, however, tend to be defined in order as in foo0, foo1, foo2... so tree balance is an issue.

I considered balancing and auto-balancing algorithms, but then I realized that the hash for a key given a good hash function should have a 50% average chance for left/right. And I already had a hash function so that meant no additional code. So I would get a balanced tree for free(edit: at no point did I want to imply a perfectly balanced tree, this is emergent behavior).

I wanted to check if my theory was correct. When I search for BSTs, hashes, and hash trees, however, no relevant information turns up.

What is the term for a binary search tree that balances itself by using a hash?

edit:
I am going to accept the answer because if something doesn't have a name it just doesn't, but I tested this because some comments and asides imply that it is worse than it actually is.

I tested a sorted list of 468685 words containing both English words and "elite" password cracking words.

balanced-tree:    worst(19)     average(18.84)
edlin-tree:       worst(47)     average(22.72)
edlin-tree(SHA2): worst(46)     average(23.25)
regular BST:      worst(468685) average(234342.50)


So it's not going to replace your language library algorithms but, as stated in the accepted answer, it is obviously logarithmic.

As for the hash function(Which for reference is add+xorshift as in:

do{h+=*s;h^=h>right;h^=h<<left2;}while(*s);


), I have also tested how bad it is by comparing collisions to the expected value.

```
8 bits:
expected : 468428.99
axs : 468429
SHA2-256 : 468429

16 bits:
expected : 403200.35
axs : 403189
SHA2-256 : 403198

24 bits:
expected : 6485.99
axs : 6451
SHA2-256 : 6433

31 bits:
expected : 51.14
axs : 59
SHA2-256 : 47

32 bits:
expe

Solution

It's not that the analysis in the previous answer is wrong regarding the weight balance, but the answer I was looking for was that this algorithm is basically equivalent to a http://en.wikipedia.org/wiki/Random_binary_tree .

The longest path is $$4.311 \times \log_{e} n$$

EDIT: I'll update this answer because the information below was relatively hard to come by.

The magic number comes from:

Reed, Bruce (2003), "The height of a random binary search tree", Journal of the ACM 50 (3): 306–332, doi:10.1145/765568.765571.


The author shows that the expected height of a random binary tree is $E(H_n ) = \alpha \log_e n − \beta\log_e \log_e n + O(1)$, where $α = 4.311\ldots$ and $β = 1.953\ldots$. The magic number represents the upper bound.

The expected height happens to be $\approx\log_{\frac{4}{3}} n$, which is somewhat intuitive when you consider that each parent node represents a random pivot partitioning the node search space like in random quicksort.

Finally, the average access time for any given node is actually $\approx\log_{\frac{4}{3}} \sqrt{n}$(I can't find a reference for this).

That makes using hash as the key $\approx1.205\log_2n$ or 20% slower on average than a self-balancing binary tree.*

Similar structures with various trade-offs include the hash-as-priority treap, hash+b-tree, and hash+suffix tree. The suffix tree is only a bit less lazy than the proposed structure and provides expected $\log_2n$ access time.

[*] Many balanced tree algorithms also have a worst-case height upper bound of $\log_\phi n$

Code Snippets

Reed, Bruce (2003), "The height of a random binary search tree", Journal of the ACM 50 (3): 306–332, doi:10.1145/765568.765571.

Context

StackExchange Computer Science Q#28194, answer score: 4

Revisions (0)

No revisions yet.