patternMinor

Are the Before and After sets for context-free grammars always context-free?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

closure-properties formal-languages formal-grammars cs stackoverflow context-free

afterthefreearealwaysgrammarsforcontextandbefore

Problem

Let $G$ be a context-free grammar. A string of terminals and nonterminals of $G$ is said to be a sentential form of $G$ if you can obtain it by applying productions of $G$ zero or more times to the start symbol of $S$. Let $\operatorname{SF}(G)$ be the set of sentential forms of $G$.

Let $\alpha \in \operatorname{SF}(G)$ and let $\beta$ be a substring of $\alpha$ - we call $\beta$ a fragment of $\operatorname{SF}(G)$. Now let

$\operatorname{Before}(\beta) = \{ \gamma \ |\ \exists \delta . \gamma \beta \delta \in \operatorname{SF}(G) \}$

and

$\operatorname{After}(\beta) = \{ \delta \ |\ \exists \gamma . \gamma \beta \delta \in \operatorname{SF}(G) \}$.

Are $\operatorname{Before}(\beta)$ and $\operatorname{After}(\beta)$ context-free languages? What if $G$ is unambiguous? If $G$ is unambiguous, are $\operatorname{Before}(\beta)$ and $\operatorname{After}(\beta)$ also describable by an unambiguous context-free language?

This is a followup to my earlier question, after an earlier attempt to make my question easier to answer failed. A negative answer will make the encompassing question I'm working on very hard to answer.

Solution

Yes, $\mbox{Before}(\beta)$ and $\mbox{After}(\beta)$ are context-free languages. Here's how I would prove it. First, a lemma (which is the crux). If $L$ is CF then:

$\mbox{Before}(L,\beta) = \{ \gamma \ |\ \exists \delta . \gamma \beta \delta \in L \}$

and

$\mbox{After}(L,\beta) = \{ \gamma \ |\ \exists \delta . \delta \beta \gamma \in L \}$

are CF.

Proof? For $\mbox{Before}(L,\beta)$ construct a non-deterministic finite-state transducer $T_{\beta}$ that scans a string, outputting every input symbol it sees and simultaneously searches non-deterministically for $\beta$. Whenever $T_{\beta}$ sees the first symbol of $\beta$ it forks non-deterministically and ceases outputting symbols until either it finishes seeing $\beta$ or it sees sees a symbol that deviates from $\beta$, stopping in either case. If $T_{\beta}$ sees $\beta$ in full, it accepts upon stopping, which is the only way it accepts. If it sees a deviation from $\beta$, it rejects.

The lemma can be jiggered to handle cases where $\beta$ could overlap with itself (like $abab$ -- keep looking for $\beta$ even while in the midst of scanning for a prior $\beta$) or appears multiple times (actually, the original non-determinisic forking already handles that).

It's fairly clear that $T_\beta(L) = \mbox{Before}(L,\beta)$, and since the CFLs are closed under finite-state transduction, $\mbox{Before}(L,\beta)$ is therefore CF.

A similar argument goes for $\mbox{After}(L,\beta)$, or it could be done with string reversals from $\mbox{Before}(L,\beta)$ , CFLs also being closed under reversal:

$\mbox{After}(L,\beta) = \mbox{rev}(\mbox{Before}(\mbox{rev}(L),\mbox{rev}(\beta)))$

Actually, now that I see the reversal argument, it would be even easier to start with $\mbox{After}(L,\beta)$, since the transducer for that is simpler to describe and verify -- it outputs the empty string while looking for a $\beta$. When it finds $\beta$ it forks non-deterministically, one fork continuing to look for further copies of $\beta$, the other fork copying all subsequent characters verbatim from input to output, accepting all the while.

What remains is to make this work for sentential forms as well as CFLs. But that is pretty straightforward, since the language of sentential forms of a CFG is itself a CFL. You can show that by replacing every non-terminal $X$ throughout $G$ by say $X^\prime$, declaring $X$ to be a terminal, and adding all productions $X^\prime \rightarrow X$ to the grammar.

I'll have to think about your question on unambiguity.

Context

StackExchange Computer Science Q#802, answer score: 9

Revisions (0)

No revisions yet.