patternMinor

Finding the number of distinct strings in regular expression

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

regular-expressions parsers cs finite-automata stackoverflow regular-languages lexical-analysis

expressiondistinctnumbertheregularfindingstrings

Problem

Given the regular expression $(1 + \varepsilon + 0 )(1 + \varepsilon + 0 )(1 + \varepsilon + 0 )(1 + \varepsilon + 0 )$, how many distinct strings are in the language? How do you determine this from looking at the regular expression? Do I have to generate a table of all possible combinations or is there a more straightforward way?

Solution

In your example, think of the result as having filled four slots: _ _ _ _, each of which can take one or three substrings, namely 0, 1, or the empty string. Ignoring the empty strings, it's clear that there are sixteen possible results: 0000, 0001, 0010, ... , 1111.

With the empty strings, though, since we could make 10 by $(\epsilon)(\epsilon)(1)(0)$, or by $(\epsilon)(1)(\epsilon)(0)$, or four other arrangements. What now? Well, if we realize that the six possibilities we just had all correspond to the string 10, we've got it: we'll have all strings over $\{0,1\}$ of length zero, one, two, three, and four, namely

$\epsilon$ (1 of them)

$0, 1$ (2 of these)

$00, 01, 10, 11$ (4)

$000, 001, 010, \dots, 111$ (8)

$0000, 0001, 0010, \dots 1111$ (16)

For a total of $1+2+4+8+16=31 = 2^5-1$. This is a particularly nice example; in general it'll be far more complicated, like, say the the expression $(1 + 01)1(1(0+\epsilon) + (101(10+101)))$. Sad to say, there's no simple rule governing all cases.

By the way, welcome to the site!

Context

StackExchange Computer Science Q#116179, answer score: 7

Revisions (0)

No revisions yet.