HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Maximum number of different substrings in big string

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
substringsnumbermaximumdifferentbigstring

Problem

I'm trying to find the maximum number of different substrings of string consisting only of lowercase letters 'a' - 'z', if the length of string can be up to $5000$

Firstly, I was thinking that the number of substrings is $\frac{n\cdot(n-1)}{2}$ but then I noticed that a lot of the substrings will be the same, and also because of the length of the string of most $5000$ we cannot put all the possible strings with length 3 for example.

What is the maximum number of different substrings we can get in one string of length $5000$

Solution

Here is a general solution for an alphabet of size $d \geq 3$ and a string of length $n$.

Every string of length $n$ has $n-\ell+1$ substrings of length $\ell$. Hence the number of different substrings, of any length, is at most
$$
\sum_{\ell=1}^n \min(n-\ell+1,d^\ell).
$$
Consider now an infinite de Bruijn sequence, in which each prefix of size $d^r + r - 1$ is a non-cyclic de Bruijn sequence for length $r$. When $d \geq 3$, such sequences exist, as shown by Becher and Heiber, On extending de Bruijn sequences. Truncate such a sequence to length $n$. We leave the reader to show that the truncated sequence achieves the bound stated above.

Context

StackExchange Computer Science Q#85932, answer score: 5

Revisions (0)

No revisions yet.