patternMajor
Determining capabilities of a min-heap (or other exotic) state machines
Viewed 0 times
determiningcapabilitiesexoticheapminstatemachinesother
Problem
See the end of this post for some clarification on the definition(s) of min-heap automata.
One can imagine using a variety of data structures for storing information for use by state machines. For instance, push-down automata store information in a stack, and Turing machines use a tape. State machines using queues, and ones using two multiple stacks or tapes, have been shown to be equivalent in power to Turing machines.
Imagine a min-heap machine. It works exactly like a push-down automaton, with the following exceptions:
This machine can accept all regular languages, simply by not using the heap. It can also accept the language $\displaystyle \{a^{n}b^{n} \in \{a, b\}^{} \mid n \ge 0\}$ by adding $a$'s to the heap, and removing $a$'s from the heap when it reads $b$'s. It can accept a variety of other context-free languages. However, it cannot accept, for instance, $\displaystyle \{w \in \{a, b\}^{} \mid w = w^{R}\}$ (stated without proof). EDIT: or can it? I don't think it can, but I've been surprised before, and I'm sure I'll keep being surprised when my assumptions to keep making of me an... well.
Can it accept any context-sensitive or Turing-complete languages?
More generally, what research, if any, has been pursued in this direction? What results are there, if any? I am also interested in other varieties of exotic state machines, possibly tho
One can imagine using a variety of data structures for storing information for use by state machines. For instance, push-down automata store information in a stack, and Turing machines use a tape. State machines using queues, and ones using two multiple stacks or tapes, have been shown to be equivalent in power to Turing machines.
Imagine a min-heap machine. It works exactly like a push-down automaton, with the following exceptions:
- Instead of getting to look at the last thing you added to the heap, you only get to look at the smallest element (with the ordering defined on a per-machine basis) currently on the heap.
- Instead of getting to remove the last thing you added to the heap, you only get to remove one of the smallest element (with the ordering defined on a per-machine basis) currently on the heap.
- Instead of getting to add an element to the top of the heap, you can only add an element to the heap, with its position being determined according to the other elements in the heap (with the ordering defined on a per-machine basis).
This machine can accept all regular languages, simply by not using the heap. It can also accept the language $\displaystyle \{a^{n}b^{n} \in \{a, b\}^{} \mid n \ge 0\}$ by adding $a$'s to the heap, and removing $a$'s from the heap when it reads $b$'s. It can accept a variety of other context-free languages. However, it cannot accept, for instance, $\displaystyle \{w \in \{a, b\}^{} \mid w = w^{R}\}$ (stated without proof). EDIT: or can it? I don't think it can, but I've been surprised before, and I'm sure I'll keep being surprised when my assumptions to keep making of me an... well.
Can it accept any context-sensitive or Turing-complete languages?
More generally, what research, if any, has been pursued in this direction? What results are there, if any? I am also interested in other varieties of exotic state machines, possibly tho
Solution
You can recognize the canonical non-context-free (but context-sensitive) language $\{ a^n b^n c^n\ |\ n \geq 1 \}$ with this type of state machine. The crux is that you add tokens to the heap for every $a$ character, and while parsing the $b$ characters, you add 'larger' tokens to the heap, so they only end up at the bottom of the heap when you have parsed all the $b$ characters.
Heap symbols are $a$ and $b$, where $a < b$. We consume all the $a$ symbols on the input and add $a$ symbols to the heap. If we encounter a $b$, we switch strategies: for every $b$ we encounter subsequently we remove an $a$ from the heap and add a $b$ to the heap. When we encounter a $c$ we should have run out of $a$s to remove, and then for every $c$ in the remaining input we remove a $b$ from the heap. If the heap is empty at the end, the string is in the language. Obviously, we reject if something goes wrong.
Update:
The language $EPAL = \{ ww^R | w \in \{a, b\}^* \}$ can not be recognized by min-heap automata. Suppose that we do have a min-heap automaton that can recognize $EPAL$. We look at the 'state' the automaton is in after reading $w$ (the first part of the input, so $w^R$ is next). The only state we have are the contents of the heap and the particular state of the automaton it is in. This means that after recognizing $w$, this 'state' needs to hold enough information to match $w^R$.
In particular, in order to do this, there must be $2^n$ possible different 'state's (where $n = |w|$), as there are $2^n$ possible words consisting of $a$ and $b$ characters. As there are only a finite number of states and only a finite number of heap characters, this implies that there exists some word $w$ for which the heap contains an exponential number of some heap character, say $x$.
We first prove the theorem for deterministic min-heap automata, and then extend this proof to non-deterministic min-heap automata. In particular, deterministic automata that recognize some language will not put themselves in an infinite loop, which is a useful property.
We shall prove that the heap can only contain at most a number of heap tokens that is linear in the number of characters read from the input. This immediately rules out that $x$ appears an exponential number of times on the heap, which completes the proof that $EPAL$ can not be recognized by min-heap automata.
Because we only have a finite number of states in our automaton and because a deterministic automaton will not put itself into an infinite loop, on reading an input signal it will add at most a constant number of heap characters onto the heap. Similarly, on consuming some heap symbol $y$, it can only add at most a constant number of heap characters that are strictly larger than $y$ and it can only decrease the number of $y$ symbols on the stack (otherwise we get an infinite loop).
Consuming heap symbols may therefore cause an (enormous) buildup of larger heap symbols, but as there are only a constant number of different types of heap symbols, this is only a constant number not dependent on $n$. This implies that the number of heap symbols is at most some (large) constant times the number of input symbols read so far. This completes the proof for the deterministic case.
In the non-deterministic case, the proof is similar, but a bit trickier: instead of adding at most some constant number of heap tokens to the heap, it adds some arbitrary number of heap tokens to the heap. However, the crucial point is that this number does not depend on $n$. In particular, if we can non-deterministically get exactly the right heap symbols on the heap after recognizing $w$ (right for recognizing $w^R$), we can also non-deterministically choose the heap symbols that match some other word $w'$, and thereby recognize $w w'^R$, thus contradicting that the min-heap automaton recognizes exactly $EPAL$.
Update 3: I'll make the last argument (about non-determinism) rigorous. By the above argument, there must exist an infinite set of words $W \subseteq \{a,b\}^*$ such that for every $w \in W$, after recognizing $w$, the heap contains $\omega(|w|)$ elements (note that we can talk about $O(f(|w|))$ as we have an infinite set of words). As we cannot get that many elements on the heap through deterministic means, we must have had some form of a loop in which we first non-deterministically chose to add more elements to the heap (without consuming input), and later chose to exit this loop, and we must have traversed this loop $\omega(1)$ times.
Take the set of all such loops used by $W$. As there are only $O(1)$ states, the size of this set is $O(1)$, and the set of all its subsets is also $O(1)$. Now note that the 'deterministic' part of the execution paths can only contribute to $O(|w|)$ of the tokens, which means that a lot of the exponential number of different words must have execution paths whose 'deterministic' parts contribute the same tokens to the heap. In particular, the only way to get more tokens
Heap symbols are $a$ and $b$, where $a < b$. We consume all the $a$ symbols on the input and add $a$ symbols to the heap. If we encounter a $b$, we switch strategies: for every $b$ we encounter subsequently we remove an $a$ from the heap and add a $b$ to the heap. When we encounter a $c$ we should have run out of $a$s to remove, and then for every $c$ in the remaining input we remove a $b$ from the heap. If the heap is empty at the end, the string is in the language. Obviously, we reject if something goes wrong.
Update:
The language $EPAL = \{ ww^R | w \in \{a, b\}^* \}$ can not be recognized by min-heap automata. Suppose that we do have a min-heap automaton that can recognize $EPAL$. We look at the 'state' the automaton is in after reading $w$ (the first part of the input, so $w^R$ is next). The only state we have are the contents of the heap and the particular state of the automaton it is in. This means that after recognizing $w$, this 'state' needs to hold enough information to match $w^R$.
In particular, in order to do this, there must be $2^n$ possible different 'state's (where $n = |w|$), as there are $2^n$ possible words consisting of $a$ and $b$ characters. As there are only a finite number of states and only a finite number of heap characters, this implies that there exists some word $w$ for which the heap contains an exponential number of some heap character, say $x$.
We first prove the theorem for deterministic min-heap automata, and then extend this proof to non-deterministic min-heap automata. In particular, deterministic automata that recognize some language will not put themselves in an infinite loop, which is a useful property.
We shall prove that the heap can only contain at most a number of heap tokens that is linear in the number of characters read from the input. This immediately rules out that $x$ appears an exponential number of times on the heap, which completes the proof that $EPAL$ can not be recognized by min-heap automata.
Because we only have a finite number of states in our automaton and because a deterministic automaton will not put itself into an infinite loop, on reading an input signal it will add at most a constant number of heap characters onto the heap. Similarly, on consuming some heap symbol $y$, it can only add at most a constant number of heap characters that are strictly larger than $y$ and it can only decrease the number of $y$ symbols on the stack (otherwise we get an infinite loop).
Consuming heap symbols may therefore cause an (enormous) buildup of larger heap symbols, but as there are only a constant number of different types of heap symbols, this is only a constant number not dependent on $n$. This implies that the number of heap symbols is at most some (large) constant times the number of input symbols read so far. This completes the proof for the deterministic case.
In the non-deterministic case, the proof is similar, but a bit trickier: instead of adding at most some constant number of heap tokens to the heap, it adds some arbitrary number of heap tokens to the heap. However, the crucial point is that this number does not depend on $n$. In particular, if we can non-deterministically get exactly the right heap symbols on the heap after recognizing $w$ (right for recognizing $w^R$), we can also non-deterministically choose the heap symbols that match some other word $w'$, and thereby recognize $w w'^R$, thus contradicting that the min-heap automaton recognizes exactly $EPAL$.
Update 3: I'll make the last argument (about non-determinism) rigorous. By the above argument, there must exist an infinite set of words $W \subseteq \{a,b\}^*$ such that for every $w \in W$, after recognizing $w$, the heap contains $\omega(|w|)$ elements (note that we can talk about $O(f(|w|))$ as we have an infinite set of words). As we cannot get that many elements on the heap through deterministic means, we must have had some form of a loop in which we first non-deterministically chose to add more elements to the heap (without consuming input), and later chose to exit this loop, and we must have traversed this loop $\omega(1)$ times.
Take the set of all such loops used by $W$. As there are only $O(1)$ states, the size of this set is $O(1)$, and the set of all its subsets is also $O(1)$. Now note that the 'deterministic' part of the execution paths can only contribute to $O(|w|)$ of the tokens, which means that a lot of the exponential number of different words must have execution paths whose 'deterministic' parts contribute the same tokens to the heap. In particular, the only way to get more tokens
Context
StackExchange Computer Science Q#110, answer score: 25
Revisions (0)
No revisions yet.