patternModerate

Why isn't it simple to count the number of words in a regular language?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

whysimplethenumberwordsregularlanguagecountisn

Problem

Given a DFA, A, let L(A) denote the number of words A accepts.
I think it's easy to calculate L(A):
Translate the encoding of A into a regular expression.
If the Kleene star appears anywhere in the expression - the language is infinite.
Else:
Go through and count all the combinations of words that are possible to make using the expression (basically if there is a + operator on the expression, multiply the amount of legal words by the amount of strings connected by the +..)

Is this wrong?
Thanks in advance

Solution

Yep, this is wrong, because of ambiguity.

Consider the following language:
$(a + aa) + a(a + \epsilon)$.

With your method, we see 4 words, $a, aa, aa, a$. But we have duplicates! There are multiple ways to make the same word within the given regular expression.

A better method is to use dynamic programming on an minimal DFA for your language, with no "dead" states. If the minimal DFA is cyclic, the language is infinte, so we can assume there's no cycles. Using a DFA is key, because the determinism means there's exactly one path through the DFA for each word.

What you do is build up a recurrence for the number of words that end at a given state:

1 words ends at the start state: $\epsilon$

For each state $q$, the number of words ending there is the sum of the number of words ending at each state with a transition into $q$.

The total number of words is then the sum of the number of words ending at each final state.

Context

StackExchange Computer Science Q#71371, answer score: 12

Revisions (0)

No revisions yet.