HiveBrain v1.2.0
Get Started
← Back to all entries
gotchaMinor

Why does a suffix tree have a linear number of nodes (relative to input string size)?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
suffixwhylinearnumbernodesrelativesizeinputdoesstring

Problem

Aren't there $n^2$ unique substrings of a string (irrespective of the alphabet size)? Perhaps the number of unique suffix substrings is less than the number of unique substrings of a string.

Solution

For a text of length $n$ we have up to $1+{ n+1 \choose 2}$ different substrings, however there are only $n+1$ suffixes (for every suffix you can pick the position where it starts).

I assume you consider the compressed suffix tree (edge labels are words). This is a tree with $n+1$ leaves and every internal node has at least two children. Thus we have less interior nodes than leaves an therefore the tree has size $O(n)$.

Notice that in the uncompressed version (edge labels a characters) with a large alphabet, you can have super-linear suffix trees. For example, consider the text abcdefghijk....

Context

StackExchange Computer Science Q#6943, answer score: 9

Revisions (0)

No revisions yet.