patternMinor
Finding Shift-Invariant "Interesting" Elements In Words
Viewed 0 times
shiftelementswordsinvariantinterestingfinding
Problem
I want to find "interesting" elements in a sequence of symbols that are independent of their absolute position in that sequence.
Formally, I have a window size $s \in \mathbb{N}$ and a finite alphabet $\Sigma$.
I'm looking for a function $f: \Sigma^s \to \{0, \; ..., \; s\}$ that points to the first interesting position of the input word that stays interesting even if the first character is removed and an arbitrary character is appended to the end.
This means:
$\forall w \in \Sigma^s, c \in \Sigma: f(w_0...w_{s-1}) > 0 \Rightarrow f(w_1...w_{s-1}c) = f(w_0...w_{s-1}) - 1$
Of course a trivial $f$ would always return $0$.
However, I'm most interested in a function $f$ that maximizes the average result, so that interesting positions are as far away from each other as possible.
Thus my question is: What is the $f$ that maximizes $\Sigma^{w \in \Sigma^s} f(w)$ (= the score of $f$)?
One idea would be to sort the alphabet and return the first index $i \leq s/2$ of the smallest symbol in $w_0...w_{i+i}$.
That way, after every shift of $w$, the index is decreased by $1$ until it reaches $0$.
What is an
Ideally, this function $f$ should also be fast to compute and not depend on brute forcing all possible $f$ first (there are only finitely many!).
One observation is that if $w = v^iv_0...v_j$ for some $i, j \in \mathbb{N}$ and $v \in \Sigma^*$, $j < |v|$, then $f(w) < s/i$. In particular $f(c...c) = 0$ for $c \in \Sigma$.
Formally, I have a window size $s \in \mathbb{N}$ and a finite alphabet $\Sigma$.
I'm looking for a function $f: \Sigma^s \to \{0, \; ..., \; s\}$ that points to the first interesting position of the input word that stays interesting even if the first character is removed and an arbitrary character is appended to the end.
This means:
$\forall w \in \Sigma^s, c \in \Sigma: f(w_0...w_{s-1}) > 0 \Rightarrow f(w_1...w_{s-1}c) = f(w_0...w_{s-1}) - 1$
Of course a trivial $f$ would always return $0$.
However, I'm most interested in a function $f$ that maximizes the average result, so that interesting positions are as far away from each other as possible.
Thus my question is: What is the $f$ that maximizes $\Sigma^{w \in \Sigma^s} f(w)$ (= the score of $f$)?
One idea would be to sort the alphabet and return the first index $i \leq s/2$ of the smallest symbol in $w_0...w_{i+i}$.
That way, after every shift of $w$, the index is decreased by $1$ until it reaches $0$.
What is an
f with higher score?Ideally, this function $f$ should also be fast to compute and not depend on brute forcing all possible $f$ first (there are only finitely many!).
One observation is that if $w = v^iv_0...v_j$ for some $i, j \in \mathbb{N}$ and $v \in \Sigma^*$, $j < |v|$, then $f(w) < s/i$. In particular $f(c...c) = 0$ for $c \in \Sigma$.
Solution
Using an answer to write down a nice insight, inspired by the comments:
Definition. A function $f: \Sigma^s \to \{0, ..., s-1\}$ is called shift-invariant if $\forall w \in \Sigma^s, c \in \Sigma: f(w_0...w_{s-1}) > 0 \Rightarrow f(w_1...w_{s-1}c) = f(w_0...w_{s-1}) - 1$.
Definition. For a finite set of patterns $P \subseteq \Sigma^*$, define $f_P: \Sigma^s \to \{0, ..., s-1\} $ to be the function that returns the smallest position of where some $p \in P$ occurs in the input word, or $0$ if no pattern matches.
Theorem. Let $f: \Sigma^s \to \{0, ..., s-1\}$ be a shift-invariant function. Then there exists a finite set $P \subseteq \Sigma^*$ so that $f = f_P$.
Proof:
Set $P_f := P := \{ p \in \Sigma^ \; | \; \exists v \in \Sigma^: |vp| = s \land f(vp) = |v| \}$.
Let $w \in \Sigma^s$ be an arbitray input word.
Set $vp := w$ with $|v| = f(w)$. Then $p \in P$ by definition of $P$.
Because (with $p$) at least some pattern in $P$ matches $w = vp$ at or before position $|v|$, by definition of $f_P$, there exists $p' \in P$ and $x, y \in \Sigma^*$ with $xp'y = w$, $|x| = f_P(w)$ and $|x| \leq |v|$.
Because $p' \in P$, there exists $v'$ so that $f(v'p') = |v'|$. Shift-invariance of $f$ now implies $f(p'u) = 0$ for any $u \in \Sigma^{|v'|}$.
Thus, $|v| = f(xp'y) \leq |x|$ (otherwise, shifting $|x|$ times doesn't reach $0$).
Finally we have $f(w) = |v| = |x| = f_P(w)$, which finishes the proof.
Note. If no element of $P$ is a substring of another, then $f_P$ is shift-invariant.
Conjecture. For a shift-invariant function $f$ it is $f = f_{P_f} = f_{P^{\mathrm{min}}_f}$ , with $P^{\mathrm{min}}_f$ being all minimal elements (regarding substrings) of $P_f$.
Open Question. What is $P$ such that $f_P$ is shift-invariant and $\Sigma_{w \in \Sigma^s} f_P(w)$ is maximal?
Definition. A function $f: \Sigma^s \to \{0, ..., s-1\}$ is called shift-invariant if $\forall w \in \Sigma^s, c \in \Sigma: f(w_0...w_{s-1}) > 0 \Rightarrow f(w_1...w_{s-1}c) = f(w_0...w_{s-1}) - 1$.
Definition. For a finite set of patterns $P \subseteq \Sigma^*$, define $f_P: \Sigma^s \to \{0, ..., s-1\} $ to be the function that returns the smallest position of where some $p \in P$ occurs in the input word, or $0$ if no pattern matches.
Theorem. Let $f: \Sigma^s \to \{0, ..., s-1\}$ be a shift-invariant function. Then there exists a finite set $P \subseteq \Sigma^*$ so that $f = f_P$.
Proof:
Set $P_f := P := \{ p \in \Sigma^ \; | \; \exists v \in \Sigma^: |vp| = s \land f(vp) = |v| \}$.
Let $w \in \Sigma^s$ be an arbitray input word.
Set $vp := w$ with $|v| = f(w)$. Then $p \in P$ by definition of $P$.
Because (with $p$) at least some pattern in $P$ matches $w = vp$ at or before position $|v|$, by definition of $f_P$, there exists $p' \in P$ and $x, y \in \Sigma^*$ with $xp'y = w$, $|x| = f_P(w)$ and $|x| \leq |v|$.
Because $p' \in P$, there exists $v'$ so that $f(v'p') = |v'|$. Shift-invariance of $f$ now implies $f(p'u) = 0$ for any $u \in \Sigma^{|v'|}$.
Thus, $|v| = f(xp'y) \leq |x|$ (otherwise, shifting $|x|$ times doesn't reach $0$).
Finally we have $f(w) = |v| = |x| = f_P(w)$, which finishes the proof.
Note. If no element of $P$ is a substring of another, then $f_P$ is shift-invariant.
Conjecture. For a shift-invariant function $f$ it is $f = f_{P_f} = f_{P^{\mathrm{min}}_f}$ , with $P^{\mathrm{min}}_f$ being all minimal elements (regarding substrings) of $P_f$.
Open Question. What is $P$ such that $f_P$ is shift-invariant and $\Sigma_{w \in \Sigma^s} f_P(w)$ is maximal?
Context
StackExchange Computer Science Q#161072, answer score: 2
Revisions (0)
No revisions yet.