patternMinor
If regex describes FSAs, what string formats describe Turing machines?
Viewed 0 times
whatfsasdescribeformatsmachinesturingregexstringdescribes
Problem
(Topic summary under the line.)
Regex, at least the formal definition featuring only | and *, is used to describe words accepted by a given FSA, but it can be transformed into the corresponding state diagram (by constructing a NFSA with epsilon transitions from "modules" that correspond to the concatenations, alternations and Kleene stars in the regex, and then deriving the DFSA from that). Thus, it is as if regex captures the "essence" of state diagrams, which are just oriented graphs; a way to encode the entire graph in a single string.
A DFSA needs to have as many outgoing transitions per each state as there are symbols in the chosen alphabet. Let's assume two for our alphabet L: 0 and 1. These symbols can be understood as if-else statements, the machine takes one path if it sees a 0, and a different path for 1. We can provide substitutions of the form 0->0A, where A is some set of tape instructions. Choosing the right syntax semantics for the context-free aspects of these strings then allows us to omit the 0's and 1's entirely. For example, replace ()* with [], so that [A]B reads "while 1, A, else end while, B". Similarly, since |L|=2, we can ultimately reduce every alternation to the form (A|B) (so for example no expressions of the "(A|B|C)" kind), and thus we can rephrase this as "if 1, A, else B, end if". Finally, for pesky cases where the while condition is inverted, we can define "![]" for now, which reads "while 0...".
Example Busy Beaver-esque automaton (state 1 = initial, state 0 = halting):
State 1: if ( read 1 ) { A; state = 2; } else { B; state = 0; }
State 2: if ( read 1 ) { C; state = 2; } else { D; state = 1; }
Where A, B, C, and D are instructions for the tape-manipulating machinery.
Compact code for this state-transition table: 2021
String representation of the instructions themselves: [A[C]D]B
Informal proof of the correctness of the above string: If the first read symbol is 0, the first [] won't execute and B will execute instead; the
Regex, at least the formal definition featuring only | and *, is used to describe words accepted by a given FSA, but it can be transformed into the corresponding state diagram (by constructing a NFSA with epsilon transitions from "modules" that correspond to the concatenations, alternations and Kleene stars in the regex, and then deriving the DFSA from that). Thus, it is as if regex captures the "essence" of state diagrams, which are just oriented graphs; a way to encode the entire graph in a single string.
A DFSA needs to have as many outgoing transitions per each state as there are symbols in the chosen alphabet. Let's assume two for our alphabet L: 0 and 1. These symbols can be understood as if-else statements, the machine takes one path if it sees a 0, and a different path for 1. We can provide substitutions of the form 0->0A, where A is some set of tape instructions. Choosing the right syntax semantics for the context-free aspects of these strings then allows us to omit the 0's and 1's entirely. For example, replace ()* with [], so that [A]B reads "while 1, A, else end while, B". Similarly, since |L|=2, we can ultimately reduce every alternation to the form (A|B) (so for example no expressions of the "(A|B|C)" kind), and thus we can rephrase this as "if 1, A, else B, end if". Finally, for pesky cases where the while condition is inverted, we can define "![]" for now, which reads "while 0...".
Example Busy Beaver-esque automaton (state 1 = initial, state 0 = halting):
State 1: if ( read 1 ) { A; state = 2; } else { B; state = 0; }
State 2: if ( read 1 ) { C; state = 2; } else { D; state = 1; }
Where A, B, C, and D are instructions for the tape-manipulating machinery.
Compact code for this state-transition table: 2021
String representation of the instructions themselves: [A[C]D]B
Informal proof of the correctness of the above string: If the first read symbol is 0, the first [] won't execute and B will execute instead; the
Solution
There is indeed an extension of regular expressions that accepts the language of two-stack push-down automata, which has the same expressive power as Turing machines.
First off, to make the algebra more obvious, I'm going to define the operators using Kleene algebra notation.
$0$ is the empty set and $1$ is the zero-length string. Juxtaposition, or $\cdot$, means string concatenation, and $+$ means set union. The algebra satisfies the usual non-commutative semiring axioms:
$$A + B = B + A$$
$$A + (B + C) = (A + B) + C$$
$$A + 0 = 0 + A = A$$
$$A \cdot (B \cdot C) = (A \cdot B) \cdot C$$
$$A \cdot 1 = 1 \cdot A = A$$
$$A\cdot(B + C) = A \cdot B + A \cdot C$$
$$(A + B) \cdot C = A \cdot C + B \cdot C$$
$$0 \cdot A = A \cdot 0 = 0$$
Plus the axiom that addition (i.e. set union) is idempotent:
$$A + A = A$$
Plus, of course, the Kleene closure operator $A^*$, which I won't axiomatise here, but you know what it means.
Now let's think about single-stack PDAs. Probably the most straightforward way to express this is using Dirac notation. Intuitively, the "bra" $\left$ pops the stack symbol $\psi$.
It satisfies a few axioms. First off, brackets commute with terminal symbols:
$$a \left = \left| \psi \right> a$$
When a bra meets a ket, they form a kind of orthonormal inner product:
$$\left = 1$$
$$\left = 0$$
Intuitively, pushing a stack symbol followed by popping the same symbol is equivalent to doing nothing, whereas pushing a stack symbol followed by popping a different symbol is not allowed, and so the language accepted by that sequence of operations is the null set.
Once you have those operators, you can do things like give a "PDA expression" which accepts the language $a^n b^n$:
$$ \left\right)^* \left| 0 \right>$$
To implement two-stack PDAs, and hence Turing machines, you simply need two kinds of push and pop, say $\left< \phi \right|_1$ and $\left< \phi \right|_2$. Coming up with interesting "Turing expressions" is left as an exercise.
And just to go down the rabbit hole, the Dirac notation is suggestive of how you could generalise this even further. "Classical" stacks give you an orthonormal inner product whose only values are $0$ and $1$. If you allowed other values of the inner product, this would give you a regular expression-like notation which can represent quantum Turing machines.
First off, to make the algebra more obvious, I'm going to define the operators using Kleene algebra notation.
$0$ is the empty set and $1$ is the zero-length string. Juxtaposition, or $\cdot$, means string concatenation, and $+$ means set union. The algebra satisfies the usual non-commutative semiring axioms:
$$A + B = B + A$$
$$A + (B + C) = (A + B) + C$$
$$A + 0 = 0 + A = A$$
$$A \cdot (B \cdot C) = (A \cdot B) \cdot C$$
$$A \cdot 1 = 1 \cdot A = A$$
$$A\cdot(B + C) = A \cdot B + A \cdot C$$
$$(A + B) \cdot C = A \cdot C + B \cdot C$$
$$0 \cdot A = A \cdot 0 = 0$$
Plus the axiom that addition (i.e. set union) is idempotent:
$$A + A = A$$
Plus, of course, the Kleene closure operator $A^*$, which I won't axiomatise here, but you know what it means.
Now let's think about single-stack PDAs. Probably the most straightforward way to express this is using Dirac notation. Intuitively, the "bra" $\left$ pops the stack symbol $\psi$.
It satisfies a few axioms. First off, brackets commute with terminal symbols:
$$a \left = \left| \psi \right> a$$
When a bra meets a ket, they form a kind of orthonormal inner product:
$$\left = 1$$
$$\left = 0$$
Intuitively, pushing a stack symbol followed by popping the same symbol is equivalent to doing nothing, whereas pushing a stack symbol followed by popping a different symbol is not allowed, and so the language accepted by that sequence of operations is the null set.
Once you have those operators, you can do things like give a "PDA expression" which accepts the language $a^n b^n$:
$$ \left\right)^* \left| 0 \right>$$
To implement two-stack PDAs, and hence Turing machines, you simply need two kinds of push and pop, say $\left< \phi \right|_1$ and $\left< \phi \right|_2$. Coming up with interesting "Turing expressions" is left as an exercise.
And just to go down the rabbit hole, the Dirac notation is suggestive of how you could generalise this even further. "Classical" stacks give you an orthonormal inner product whose only values are $0$ and $1$. If you allowed other values of the inner product, this would give you a regular expression-like notation which can represent quantum Turing machines.
Context
StackExchange Computer Science Q#141500, answer score: 3
Revisions (0)
No revisions yet.