patternMinor

What are appropriate isomorphisms between formal languages?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

whatareformallanguagesappropriatebetweenisomorphisms

Problem

A formal language $L$ over an alphabet $\Sigma$ is a subset of $\Sigma^*$, that is, a set of words over that alphabet. Two formal languages $L$ and $L'$ are equal, if the corresponding sets are extensionally equal as subsets of $L\cup L'$. One can use languages in complexity theory to formalize the concept of a "problem." One could complain that "in general" extensional equality is undecidable, but I believe this would be misguided.

I'm pondering the following problem since some time: Two languages $L$ and $L'$ over alphabets $\Sigma=\{a,b\}$ and $\Sigma'=\{c,d\}$ (where $a$, $b$, $c$ and $d$ are distinct letters) can never be equal, even if they describe "exactly" the same "problem." But they should be isomorphic, if they really describe "exactly" the same "problem." What I would like to know are possible notions of isomorphism appropriate for complexity theory. I initially thought a computationally weak "translator" like a finite-state machine could be used to define the allowed isomorphisms, but this already seems to break down for trivial syntactic translations between equivalent logical formulas. (See for example this table with the syntactic definition of the dual $A^\bot$ in linear logic.)

Today I had the following idea: A definition of the language corresponding to a certain "decision problem" often has two parts: (1) An encoding of the allowed problem instances as finite strings of symbols, and (2) a definition of the "accepted" problem instances which belong to the language. If the checking whether a given finite string of symbols is an encoding of an allowed problem instance already requires a machine computationally stronger than a finite-state machine, then this stronger machine should also be used for the definition of the allowed isomorphisms.

Questions: Has this line of reasoning any chance of "resolving" my problem? Has my problem already been solved, so that I just need to read the right references? Does my problem itself makes any sense, or

Solution

Has my problem already been solved, so that I just need to read the right references?

The theory of abstract family of languages is relevant. For example, the morphisms defined by finite state transducers lead to the cone family. Eilenberg's short ICM talk from 1970 nicely explains this framework, see also chapter 11 "Closure properties of families of languages" from Introduction to Automata Theory, Languages, and Computation (1st ed.) by J. Hopcroft and J. Ullman from 1979. However, only nondeterministic languages fit into this framework1. In the end, the book Theory of codes by J. Berstel and D. Perrin from 1985 helped me to come up with reasonable solutions for my problem. Codes and Automata by J. Berstel, D. Perrin and C. Reutenauer from 2009 is a major revision of this book with a much broader coverage.

Has this line of reasoning any chance of "resolving" my problem? Does my problem itself makes any sense, or is this just as misguided as ...?

The assumption that there is one correct category for modeling isomorphisms between languages to "formalize the concept of a problem" is misguided. There are many different categories which can be interesting in the context of formal languages.

Here are three interesting categories related to many-one reductions, which will be referred to as total, partial, and relational. The objects of the categories are pairs $(\Sigma, L)$ of a finite alphabet $\Sigma$ and a language $L\subset\Sigma^$ of words over $\Sigma$. For total, the morphisms between the source object $(\Sigma, L)$ and the target object $(\Sigma', L')$ are total functions $f : \Sigma^ \to \Sigma'^$ with $L=f^{-1}(L')$. For partial, the morphisms are partial functions $f : \Sigma^ \to \Sigma'^$ with $L=f^{-1}(L')$, where two partial functions $f$, $g$ are considered equal (as morphisms) if $f(x)=g(x)$ for all $x\in L$. For relational, the morphisms are relations $R\subset \Sigma^ \times \Sigma'^*$ with $L=R^{-1}(L')$, and any two morphisms between the same source and target are considered to be equal. The set of allowed functions or relations can be restricted to various simple "translators" to get categories with interesting isomorphisms.

The monoid homomorphisms from $\Sigma^$ to $\Sigma'^$ give a very basic total category. The isomorphisms of this category are basically just the bijections between $\Sigma$ and $\Sigma'$. Any reasonable family of languages should better respect these isomorphisms, i.e. be closed under inverse homomorphisms.

The partial functions defined by deterministic log-space Turing machine translators give a quite natural partial category. It is able to perform many trivial syntactic transformations (like applying De Morgan's laws to move negations to the atoms), includes the morphism defined by functional finite state transducers1, and also can sort. Still it won't identify two completely unrelated languages as isomorphic, because equality of the composition of two morphisms to an identity morphism is a much stronger requirement than just existence of many-one reductions in both directions.

The relations defined by nondeterministic log-space Turing machine translators give an interesting relational category. SAT is isomorphic to HORNSAT in this category, but it is an open question whether TAUTOLOGY or any other co-NP-complete problem is isomorphic to HORNSAT.

Two languages $L$ and $L'$ over alphabets $\Sigma=\{a,b\}$ and $\Sigma'=\{c,d\}$ (where $a$, $b$, $c$ and $d$ are distinct letters) can never be equal, even if they describe "exactly" the same "problem." But they should be isomorphic, if they really describe "exactly" the same "problem."

The very basic total category described above solves this problem.

The problem becomes more interesting if "exactly the same" is replaced by "nearly the same for most practical purposes": Let $L$ be a language over $\Sigma=\{U,C,A,G\}$ and let $L'$ be the language over $\Sigma'=\{0,1\}$ obtained from $L$ by the substitution $U \to 00$, $C \to 01$, $A \to 10$, and $G \to 11$. Note that in any total category, $L$ and $L'$ are not isomorphic for $L=\Sigma^*$. The same would be true for partial categories, if the part "where two partial functions $f$, $g$ are considered equal (as morphisms) if $f(x)=g(x)$ for all $x\in L$" were omitted from the definition.

The quite natural partial category described above is sufficient for making $L$ and $L'$ isomorphic. It would be nice to have a more basic (i.e. more restrictive) category that makes them isomorphic. The following (successively more restrictive) categories look reasonable to me:

The partial functions realized by unambiguous finite state transducers2 where the only accepting state is the initial state. The isomorphisms of this partial category are (a subset of the) bijections between recognizable variable length codes.

The partial functions realized by deterministic finite state transducers where the only accepting state is the initial state. The isomorphisms

Context

StackExchange Computer Science Q#24574, answer score: 6

Revisions (0)

No revisions yet.