patternMinor
Sum of squares of length of longest common prefix for all pairs in given strings
Viewed 0 times
pairsalllengthgivensquaresforsumprefixlongeststrings
Problem
Given a list of unique strings, how can I find the sum of squares of longest common prefixes length for all pairs?
I think a trie will be useful but I'm not sure how to calculate the final result.
For example, if given strings are and, anacron, anagram, build and buffer
Then possible pairs are (and, anagram) which will give 2 as the length of common prefix and squaring it we will get 4
similarly, for (anagram, anacron) we will get 3 common prefixes and we will add 9 and (and, anacron) will give 4, (build, buffer) will give 2
Other possible pairs will have 0 common prefixes hence they do not contribute.
So the final answer will be 4+9+4+4 = 21.
I assume adding the length of squares and not the original length is a key to calculate the result.
I think a trie will be useful but I'm not sure how to calculate the final result.
For example, if given strings are and, anacron, anagram, build and buffer
Then possible pairs are (and, anagram) which will give 2 as the length of common prefix and squaring it we will get 4
similarly, for (anagram, anacron) we will get 3 common prefixes and we will add 9 and (and, anacron) will give 4, (build, buffer) will give 2
Other possible pairs will have 0 common prefixes hence they do not contribute.
So the final answer will be 4+9+4+4 = 21.
I assume adding the length of squares and not the original length is a key to calculate the result.
Solution
I'm assuming you are interested in going over all unordered pairs of distinct words in your given set of words $S$, a collection I denote by $S_2$.
You can augment a trie for $S$ with a count of the words extending each node. Suppose that a node $v$ corresponds to a prefix of length $|v|$, and has $n(v)$ words extending it. Using $v \preceq x$ to denote "$v$ is a prefix of $x$" and $c(x,y)$ to denote the size of the largest common prefix of $x,y$, for any function $f\colon \mathbb{N} \to \mathbb{R}$ we have
$$
\sum_v f(|v|) \binom{n(v)}{2} = \sum_v \sum_{\substack{\{x,y\} \in S_2 \\ v \preceq x,y}} f(|v|) = \sum_{\{x,y\} \in S_2} \sum_{v \preceq x,y} f(|v|) = \sum_{\{x,y\} \in S_2} \sum_{\ell=0}^{c(x,y)} f(\ell).
$$
In your case, you are interested in finding a function $f$ such that for all $n \in \mathbb{N}$,
$$
\sum_{\ell=0}^n f(\ell) = n^2.
$$
Clearly $f(0) = 0$ and for $\ell > 0$, $f(\ell) = \ell^2 - (\ell-1)^2 = 2\ell-1$.
You can augment a trie for $S$ with a count of the words extending each node. Suppose that a node $v$ corresponds to a prefix of length $|v|$, and has $n(v)$ words extending it. Using $v \preceq x$ to denote "$v$ is a prefix of $x$" and $c(x,y)$ to denote the size of the largest common prefix of $x,y$, for any function $f\colon \mathbb{N} \to \mathbb{R}$ we have
$$
\sum_v f(|v|) \binom{n(v)}{2} = \sum_v \sum_{\substack{\{x,y\} \in S_2 \\ v \preceq x,y}} f(|v|) = \sum_{\{x,y\} \in S_2} \sum_{v \preceq x,y} f(|v|) = \sum_{\{x,y\} \in S_2} \sum_{\ell=0}^{c(x,y)} f(\ell).
$$
In your case, you are interested in finding a function $f$ such that for all $n \in \mathbb{N}$,
$$
\sum_{\ell=0}^n f(\ell) = n^2.
$$
Clearly $f(0) = 0$ and for $\ell > 0$, $f(\ell) = \ell^2 - (\ell-1)^2 = 2\ell-1$.
Context
StackExchange Computer Science Q#142341, answer score: 2
Revisions (0)
No revisions yet.