HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Generation of random binary trees

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
randomgenerationbinarytrees

Problem

Given n, I want to randomly generate a binary tree (unlabelled) that has n end nodes. Could someone kindly provide a reference containing an algorithm for doing that?

I attempted to do as follows: From a PRNG obtain n PRNs in [0.0, 1.0) as (relative) frequencies of n symbols for generating a Huffman tree (used in data compression). But, if the PRNs used are uniform, then I think this would highly favour generation of those Huffman trees that are more flat and Huffman trees corresponding to widely different frequencies of the symbols would be highly suppressed in the generation process. If this is correct, how could one do better? Thanks in advance.

Solution

Binary trees are counted by Catalan numbers: the number of "full" binary trees (each node has either 0 or 2 children) with $n$ leaves is $C_{n-1}$. The Catalan numbers are given by an explicit formula, and also satisfy the recurrence
$$ C_{n+1} = \sum_{i+j = n} C_i C_j, $$
which follows directly from the interpretation as counting binary trees. You can use this formula to decode a number in $[0,C_{n-1}) = \{0,\ldots,C_{n-1}-1\}$ to a binary tree. If the index is chosen randomly, you obtain a random binary tree.

The decoding procedure is recursive. First, we partition $[0,C_{n-1})$ into intervals of lengths $C_0 C_{n-2}, C_1 C_{n-3}, \ldots, C_{n-2} C_0$. If the number falls in the interval of length $C_i C_j$, then the left subtree will have $i+1$ leaves, and the right subtree will have $j+1$ leaves. Within the interval, we have an index in the range $[0,C_i C_j)$, which we decode to a pair of indices $x \in [0,C_i)$ and $y \in [0,C_j)$. We recursively compute which tree with $i+1$ leaves is encoded by $x$ and which tree with $j+1$ leaves is encoded by $y$. The recursion stops when $n = 1$, in which case the tree consists of just the root.

Context

StackExchange Computer Science Q#22098, answer score: 12

Revisions (0)

No revisions yet.