HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Complexity of a naive algorithm for finding the longest Fibonacci substring

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thenaivefibonaccisubstringalgorithmforfindinglongestcomplexity

Problem

Given two symbols $\text{a}$ and $\text{b}$, let's define the $k$-th Fibonacci string as follows:

$$ F(k) = \begin{cases} \text{b} &\mbox{if } k = 0 \\
\text{a} &\mbox{if } k = 1 \\
F(k-1) \star F(k-2) &\mbox{else} \end{cases} $$

with $\star$ denoting string concatenation.

Thus we'll have:

  • $F(0) = \text{b}$



  • $F(1) = \text{a}$



  • $F(2) = F(1) \star F(0) = \text{ab}$



  • $F(3) = F(2) \star F(1) = \text{aba}$



  • $F(4) = F(3) \star F(2) = \text{abaab}$



  • ...



Given a string $S$ formed by $n$ symbols, we define a Fibonacci substring as any substring of $S$ which is also a Fibonacci string for a suitable choice of $\text{a}$ and $\text{b}$.

The problem

Given $S$, we want to find its longest Fibonacci substring.

A trivial algorithm

For each position $i$ of the string $S$, suppose that $F(2)$ starts there (it's enough to check that the $i$-th and $(i+1)$-th symbols are distinct). If that's the case, check if it can be extended to $F(3)$, then $F(4)$, and so on. After that, start again from the position $i+1$. Repeat until you reach the position $n$.

We must look at each symbol at least once, so it's $\Omega(n)$. There are only two for loops involved, so we can furthermore say that it's $O(n^2)$.

However (somewhat unsurprisingly) this naive algorithm performs much better than usual quadratic algorithms (if it does a lot of work on the $i$-th position, it won't do a lot of work in the next positions).

How can I use Fibonacci properties to find tighter bounds for the execution time of this algorithm?

Solution

Say that $F(n)$ occurs at some position if the substring starting at that position is compatible with either $F(n)$ or its complementation. How close can occurrences of $F(n)$ be? Take as an example $F(4) = abaab$. If $F(4)$ occurs at position $p$ then it cannot occur at position $p+1$ or $p+2$, but it can appear at position $p+3$. We let $\ell(n)$ be the smallest number such that two occurrences of $F(\ell)$ can occur at distance of $\ell$. You can prove by induction that for $n \geq 4$ we have $\ell(n) = |F(n-1)|$ (for example, $\ell(4) = 3$).

Given a string of length $N$, for each $n$ let $P(n)$ be the set of positions at which $F(n)$ occurs. We can upper bound the running time of your procedure roughly by $\sum_n |P(n)| |F(n)|$, where the sum runs over all $n$ such that $|F(n-1)| \leq N$ (say). Since occurrences of $F(n)$ are separated by at least $|F(n-1)|$, we see that the running time is bounded by order of
$$ \sum_n |F(n)| \left(\frac{N}{|F(n-1)|} + 1\right). $$
Since the lengths of the Fibonacci words increase exponentially, $\sum_n |F(n)| = O(N)$. The remaining term is $\sum_n O(N) = O(N\log N)$, since the sum contains $\log N$ many terms. We conclude that the running time is $O(N\log N)$.

Conversely, the running time on $F_n$ is $\Omega(|F_n|\log|F_n|)$, as can be proved by induction. We conclude that the worst case running time on strings of length $N$ is $\Theta(N\log N)$.

Context

StackExchange Computer Science Q#28865, answer score: 5

Revisions (0)

No revisions yet.