patternMinor
What properties must a string satisfy to be a possible output of the Burrows-Wheeler transform?
Viewed 0 times
themustwhatpropertiesburrowswheeleroutputsatisfypossibletransform
Problem
The Buttows-Wheeler transform is a function that takes a string $S$ to a new string $S'$ and an index $I$. The new string $S'$ is always an anagram (rearrangement) of the input string $S$.
However, if I pick a random string $S'$ and index $I$, the inverse Burrows-Wheeler transform usually fails. For example, if $S'$ is
Clearly, none of the rows is an anagram of
What properties must the string $S'$ satisfy so that the block contains at least one anagram of $S'$? (I suspect that if one of them is an anagram, they all are, but I don’t know how to prove it.)
However, if I pick a random string $S'$ and index $I$, the inverse Burrows-Wheeler transform usually fails. For example, if $S'$ is
"QUICK", the (intuitive, block-sorting) inverse method results in the following block:CQCQC
IKUIK
KUIKU
QCQCQ
UIKUIClearly, none of the rows is an anagram of
"QUICK", no matter the choice of $I$.What properties must the string $S'$ satisfy so that the block contains at least one anagram of $S'$? (I suspect that if one of them is an anagram, they all are, but I don’t know how to prove it.)
Solution
Consider a string with repeats of a letter, for example
Now imagine we label every occurrence of
Observe that the
Now imagine we take an arbitrary string $S'$ of length $n$ and label the letters in alphabetical order, but identical letters in order of occurrence. For example, for BANANA the labeling would be:
This list of digits from $1$ to $n$ can be interpreted as a permutation (rearrangement).
In order to be a possible output of the Burrows-Wheeler transform, the string $S'$ must have one of the following properties:
For example, the word “QUICK” does not work because Q and C form a separate cycle from the rest of the letters:
By contrast, “QUIKC” works because the permutation of letters now forms a single complete cycle:
This property of $S'$ is necessary because without it, the inverse BWT couldn’t reconstruct the string $S$: instead of visiting every character, it would visit only the characters contained in one of the cycles (and visit them multiple times).
This property of $S'$ is also sufficient because once it is met, the inverse BWT can be performed on it to produce a string $S_i$ (with any index $i \in \left\{0..n-1\right\}$). The result of this is a string which will reconstitute $S'$ when run through BWT again.
BANANA. The forward transform will create this block:ABANAN
ANABAN
ANANAB
BANANA
NABANA
NANABANow imagine we label every occurrence of
A with 1, 2 or 3 depending on which A in the original word it is.Observe that the
A’s in the leftmost column are in the same order as in the rightmost column. This is because they are both sorted by whatever text comes after them in the original string.Now imagine we take an arbitrary string $S'$ of length $n$ and label the letters in alphabetical order, but identical letters in order of occurrence. For example, for BANANA the labeling would be:
BANANA
415263
AAABNN (sorted)
123456This list of digits from $1$ to $n$ can be interpreted as a permutation (rearrangement).
In order to be a possible output of the Burrows-Wheeler transform, the string $S'$ must have one of the following properties:
- all letters in the string are the same; or
- the permutation is a full cycle (i.e. it has order $n$).
For example, the word “QUICK” does not work because Q and C form a separate cycle from the rest of the letters:
By contrast, “QUIKC” works because the permutation of letters now forms a single complete cycle:
This property of $S'$ is necessary because without it, the inverse BWT couldn’t reconstruct the string $S$: instead of visiting every character, it would visit only the characters contained in one of the cycles (and visit them multiple times).
This property of $S'$ is also sufficient because once it is met, the inverse BWT can be performed on it to produce a string $S_i$ (with any index $i \in \left\{0..n-1\right\}$). The result of this is a string which will reconstitute $S'$ when run through BWT again.
Code Snippets
ABANAN
ANABAN
ANANAB
BANANA
NABANA
NANABABANANA
415263
AAABNN (sorted)
123456Context
StackExchange Computer Science Q#150126, answer score: 2
Revisions (0)
No revisions yet.