patternMinor

Can't Understand this Subset Construction Proof

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

thiscanconstructionunderstandsubsetproof

Problem

This proof is from Introduction to Automata Theory Languages and Computation by Hopcroft & Ullman and is regarding a 'bad case' for subset construction.

The NFA which is being converted is as follows:

The argument for this then goes as follows:

Consider the NFA N of Fig. 2.15. $L(N)$ is the set of all strings of
$0's$ and $1's$ such that the nth symbol from the end is $1$.

Intuitively, a DFA D that accepts this language must remember the last
$n$ symbols it has read. Since any of $2^n$ subsets of the last $n$
symbols could have been $1$, if D has fewer than $2^n$ states, then
there would be some state $q$ such that D can be in state $q$ after
reading two different sequences of $n$ bits, say $a_1a_2…a_n$ and
$b_1b_2…b_n$.

Since the sequences are different, they must differ in some position,
say $a_i \neq b_i$. Suppose (by symmetry) that $a_i = 1$ and $b_i = 0$.
If $i = 1$, then q must be both an accepting state and a
nonaccepting state, since $a_1a_2…a_n$ is accepted (the $n^{th}$
symbol from the end is 1) and $b_1b_2…b_n$ is not.

If $i > 1$, then consider the state $p$ that D enters after reading $i - 1$
$0$'s. Then p must be both accepting and nonaccepting, since $a_ia_{i+1}…a_n00…0$ is accepted and $b_ib_{i+1}…b_n00…0$ is not.

What's confusing me are these two lines,

Since any of $2^n$ subsets of the last $n$ symbols could have been
$1$, ...

and

If $i > 1$, then consider the state $p$ that D enters after reading $i - 1$
$0$'s. Then p must be both accepting and nonaccepting, since $a_ia_{i+1}…a_n00…0$ is accepted and $b_ib_{i+1}…b_n00…0$ is not.

I understand the case where $i = 1$, but I don't understand how the $i \gt 1$ works, and what they mean by any of the $2^n$ subsets being $1$. How can the subsets be $1$?

Solution

Since any of $2^n$ subsets of the last $n$ symbols could have been $1$, ...

This is just referring to the fact that there are $2^n$ possible sequences of length $n$ made up of $0$'s and $1$'s. That is, if you have $n$ symbols, you can think of "a subset of those symbols being $1$" as taking a subset of the indices from 1 to $n$ and making the symbols at those indices be $1$.

If $i>1$, then consider the state $p$ that D enters after reading $(i−1)$ $0$'s. Then $p$ must be both accepting and nonaccepting, since $a_ia_{i+1}\ldots a_n00 \ldots 0$ is accepted and $b_ib_{i+1}\ldots b_n00 \ldots 0$ is not.

So you have two sequences of characters $a_1a_2\ldots a_n$ and $b_1b_2\ldots b_n$ which end up in the same state. You know that $a_i=1$ and $b_i=0$. Now take both of those strings and append $(i-1)$ $0$'s to each one. This gives you the following two strings $x$ and $y$:

$x= a_1a_2\ldots a_i\underbrace{a_{i+1}\ldots a_n}_{n-i\text{ symbols}}\underbrace{00 \ldots 0}_{i-1\text{ zeros}}$

which we know is a string such that the $n$th symbol from the end is $1$ ($a_i = 1$ and it is the $n$th symbol from the end).

$y= b_1b_2\ldots b_i\underbrace{b_{i+1}\ldots b_n}_{n-i\text{ symbols}}\underbrace{00 \ldots 0}_{i-1\text{ zeros}}$

which we know is a string such that the $n$th symbol from the end is $0$ ($b_i = 0$ and it is the $n$th symbol from the end).

Now think about what happens when you run these two strings through D. It's actually quite similar to the $i=1$ case: since the two strings ended up in the same state $q$ after reading $a_1a_2\ldots a_n$ and $b_1b_2\ldots b_n$, they also have to end up in the same state $p$ after reading $(i-1)$ more $0$'s. Thus this state $p$ has to be both accepting and nonaccepting since our string $x$ should be accepted but $y$ should not be accepted.

Context

StackExchange Computer Science Q#89404, answer score: 4

Revisions (0)

No revisions yet.