patternMinor
Context free grammar for strings with more $a$'s than $b$'s
Viewed 0 times
freewiththanmoregrammarforcontextstrings
Problem
I would like to prove that the grammar $G$ with the rules
$$
S \to SS \mid aSb \mid bSa \mid a \mid \varepsilon
$$
generates the language $L = \{w \mid \text{$w$ has at least as many $a$'s as $b$'s}\}$.
Question
My proof is below, but it feels rather clunky and unintuitive. Does anyone have an alternative proof and/or suggestions for how I can improve what I have?
Proof
Step 1. First, we will show that $\mathcal{L}(G) \subseteq L$. This is immediate from the fact that no rule generates more $b$'s than $a$'s. Let $w \in \mathcal{L}(G)$, generated by $G$ via $S = u_1 \Rightarrow \dots \Rightarrow u_n = w$. In each step where a $b$ is added, an $a$ must also be added. Because $w_1$ has an equal number of $a$'s and $b$'s, each subsequent $w_i$ must have at least many $a$'s than $b$'s. Therefore, $w_n \in L$, and so $\mathcal{L}(G) \subseteq L$.
Step 2. Next, we will show that $G$ generates all strings with an equal number of $a$'s and $b$'s. Let $K$ be the set of all such strings. We argue by induction to show that $K \subseteq \mathcal{L}(G)$.
Base step. Let $w \in K$ such that $|w| = 0$. Then $w$ can be generated by $G$ via $S \Rightarrow \varepsilon$.
Inductive step. Suppose $G$ can generate any string in $K$ length less than or equal to $n$. Let $w \in K$ such that $|w| = n + 1$. We distinguish several cases:
-
$w = aub$ for some string $u$. Because $w$ has an equal number of $a$'s and $b$'s, $u$ must also have an equal number of $a$'s and $b$'s. This implies $u \in K$, and by the inductive hypothesis, $S \Rightarrow^ u$. Thus, $w$ is generated by $G$ via $S \Rightarrow aSb \Rightarrow^ aub$.
-
$w = bua$ for some string $u$. This is the same as the previous case.
-
$w$ starts and ends with $a$. But because $w$ has the same number of $a$'s and $b$'s, there must be a point where the number of $b$'s “catches up” with the number of $a$'s. This implies $w = uv$, where $u, v \in K$ are non-empty. By the inductive hypothesis, $S \Rightarrow^* u$ and $S \Righta
$$
S \to SS \mid aSb \mid bSa \mid a \mid \varepsilon
$$
generates the language $L = \{w \mid \text{$w$ has at least as many $a$'s as $b$'s}\}$.
Question
My proof is below, but it feels rather clunky and unintuitive. Does anyone have an alternative proof and/or suggestions for how I can improve what I have?
Proof
Step 1. First, we will show that $\mathcal{L}(G) \subseteq L$. This is immediate from the fact that no rule generates more $b$'s than $a$'s. Let $w \in \mathcal{L}(G)$, generated by $G$ via $S = u_1 \Rightarrow \dots \Rightarrow u_n = w$. In each step where a $b$ is added, an $a$ must also be added. Because $w_1$ has an equal number of $a$'s and $b$'s, each subsequent $w_i$ must have at least many $a$'s than $b$'s. Therefore, $w_n \in L$, and so $\mathcal{L}(G) \subseteq L$.
Step 2. Next, we will show that $G$ generates all strings with an equal number of $a$'s and $b$'s. Let $K$ be the set of all such strings. We argue by induction to show that $K \subseteq \mathcal{L}(G)$.
Base step. Let $w \in K$ such that $|w| = 0$. Then $w$ can be generated by $G$ via $S \Rightarrow \varepsilon$.
Inductive step. Suppose $G$ can generate any string in $K$ length less than or equal to $n$. Let $w \in K$ such that $|w| = n + 1$. We distinguish several cases:
-
$w = aub$ for some string $u$. Because $w$ has an equal number of $a$'s and $b$'s, $u$ must also have an equal number of $a$'s and $b$'s. This implies $u \in K$, and by the inductive hypothesis, $S \Rightarrow^ u$. Thus, $w$ is generated by $G$ via $S \Rightarrow aSb \Rightarrow^ aub$.
-
$w = bua$ for some string $u$. This is the same as the previous case.
-
$w$ starts and ends with $a$. But because $w$ has the same number of $a$'s and $b$'s, there must be a point where the number of $b$'s “catches up” with the number of $a$'s. This implies $w = uv$, where $u, v \in K$ are non-empty. By the inductive hypothesis, $S \Rightarrow^* u$ and $S \Righta
Solution
Here is a somewhat simpler proof for $L \subseteq \mathcal{L}(G)$.
Apply induction directly to the claim that any string in $L$ can be generated by $G$.
As before, the base case is done by $S \rightarrow \varepsilon$.
For a nonempty string, counting $a$ as +1 and $b$ as -1, consider the partial sums:
If any intermediate (i.e., not first or last) partial sum is 0, break the string at that point, and apply $S \rightarrow SS$, and apply the induction hypothesis to both parts.
If no intermediate partial sum is 0 and the last partial sum (which is the full sum) is 0, then the first and last symbols must be different. Apply $S \rightarrow aSb$ or $S \rightarrow bSa$ as appropriate, and apply the induction hypothesis to the middle (the string with the first and last symbols removed).
If no intermediate partial sum is 0 and the last partial sum (which is the full sum) is positive (it cannot be negative), then the first symbol must be $a$. Apply $S \rightarrow SS$ and then apply $S \rightarrow a$ to the first of the two $S$; apply the induction hypothesis to the string with the starting $a$ removed.
Apply induction directly to the claim that any string in $L$ can be generated by $G$.
As before, the base case is done by $S \rightarrow \varepsilon$.
For a nonempty string, counting $a$ as +1 and $b$ as -1, consider the partial sums:
If any intermediate (i.e., not first or last) partial sum is 0, break the string at that point, and apply $S \rightarrow SS$, and apply the induction hypothesis to both parts.
If no intermediate partial sum is 0 and the last partial sum (which is the full sum) is 0, then the first and last symbols must be different. Apply $S \rightarrow aSb$ or $S \rightarrow bSa$ as appropriate, and apply the induction hypothesis to the middle (the string with the first and last symbols removed).
If no intermediate partial sum is 0 and the last partial sum (which is the full sum) is positive (it cannot be negative), then the first symbol must be $a$. Apply $S \rightarrow SS$ and then apply $S \rightarrow a$ to the first of the two $S$; apply the induction hypothesis to the string with the starting $a$ removed.
Context
StackExchange Computer Science Q#143004, answer score: 3
Revisions (0)
No revisions yet.