patternMinor

Asymptotics of the number of words in a regular language of given length

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

combinatorics formal-languages cs stackoverflow regular-languages reference-request asymptotics

numberthelengthwordsregularlanguageasymptoticsgiven

Problem

For a regular language $L$, let $c_n(L)$ be the number of words in $L$ of length $n$. Using Jordan canonical form (applied to the unannotated transition matrix of some DFA for $L$), one can show that for large enough $n$,
$$ c_n(L) = \sum_{i=1}^k P_i(n) \lambda_i^n, $$
where $P_i$ are complex polynomials and $\lambda_i$ are complex "eigenvalues". (For small $n$, we may have additional terms of the form $C_k[n=k]$, where $[n=k]$ is $1$ if $n=k$ and $0$ otherwise. These correspond to Jordan blocks of size at least $k+1$ with eigenvalue $0$.)

This representation seems to imply that if $L$ is infinite then asymptotically, $c_n(L) \sim C n^k \lambda^n$ for some $C,\lambda>0$. However, this is patently false: for the language $L$ over $\{0,1\}$ of all words of even length, $c_{2n}(L) = 2^{2n}$ but $c_{2n+1}(L) = 0$. This suggests that for some $d$ and for all $a \in \{0,\ldots,d-1\}$, either $c_{dm+a}(L) = 0$ for large enough $m$ or $c_{dm+a} \sim C_a (dm+a)^{k_a} \lambda_a^{dm+a}$. This is proved in Flajolet & Sedgewick (Theorem V.3), who attribute the proof to Berstel.

The proof provided by Flajolet and Sedgewick is somewhat technical; so technical, in fact, that they only sketch it. I attempted a more elementary proof using Perron-Frobenius theory. We can regard the transition graph of the DFA as a digraph. If the digraph is primitive then the result follows almost directly from the Perron-Frobenius theorem. If the digraph is irreducible but imprimitive with index $r$, then by considering the "$r$th power" of the DFA (each transition corresponds to $r$ symbols), we get the same result. The difficult case is when the digraph is reducible. We can reduce to the case of a path of strongly connected components, and then we get the result by estimating sums of the form
$$ \sum_{m_1+\cdots+m_k=m} \prod_{i=1}^k \lambda_i^{m_i}. $$
(Each such sum corresponds to a particular way of accepting a word, going through the different components in a certain way.) This sum, in turn, c

Solution

The argument you have sketched appears to be in line with Richard Stanley's treatment of the Transfer-Matrix Method in Enumerative Combinatorics, Volume 1 (link: pp 573; print: pp 500).

He starts with the generating function, and unpacks it by considering digraphs and permissible and prohibited factors. He then abstracts to free monoids, where he uses a refined version of the sums you gave to prove:

4.7.11 Proposition Let $B$ be a subset of $A^$ that freely generates $B$. Then $B^(\lambda)=(I-B(\lambda))^{-1}$

After working through some applications, he likewise closes the section by discussing Hadamard products in relation to horizontally-convex polyominoes.

Context

StackExchange Computer Science Q#11350, answer score: 3

Revisions (0)

No revisions yet.