snippetMinor

How do I calculate the Nth result of a context-free grammar?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

formal-languages formal-grammars cs stackoverflow

nthresultthefreegrammarcalculatehowcontext

Problem

Given a context-free grammar and a maximum depth, how do I directly compute the Nth sentence without calculating or caching intermediary sentences?

Take as an example the following grammar:

(from http://www.nltk.org/howto/generate.html)

S -> NP VP
NP -> Det N
PP -> P NP
VP -> 'slept'
VP -> 'saw' NP
VP -> 'walked' PP
Det -> 'the'
Det -> 'a'
N -> 'man'
N -> 'park'
N -> 'dog'
P -> 'in'
P -> 'with'

Given a maximum depth of 4, it will deterministically generate the following 6 sentences:

0 the man slept
1 the park slept
2 the dog slept
3 a man slept
4 a park slept
5 a dog slept

So the brute force way of finding the Nth sentence is to compute sentences 0..N, discard them, then return the Nth. But with a more complex grammar it's wasteful to recompute it every time and it's not possible to hold the current state in memory.

But my question is, is there a way to directly compute the Nth sentence?

Also, is it possible to do the reverse: given a the grammar, depth, and sentence, determent what's the sentences index of all possible sentences.
Something like:

>>> find_index_in_grammar("a man slept", grammar, depth=4)
3

Update

To clarify: traversal order, lexicographical order, or ordering by length is not important as long as it's deterministic. The example above is just one possible ordering that is deterministic and would work.

What I've tried so far is to consider the grammar as a different numeric "base", with its sentences as specific numbers in that base. With the key difference being the "base" varies with the number of possible replacements at each branch in the grammar.

```
def grammar_index(grammar, index):
sentence = [grammar.start()]
while set(sentence) & set(grammar._lhs_index): # any non-terminals?
for i, symbol in enumerate(sentence): # loop for the non terminal
possible_replacements = grammar._lhs_index.get(symbol)
if possible_replacements: # if there is a replacement

Solution

The problem is likely hard, by reduction from SAT.

Let $\varphi$ be a CNF. For each clause $C$, we can construct a production which generates all truth assignments falsifying $C$. By going over all clauses, we can construct a grammar for all nonsatisfying truth assignments. For example, if the CNF is $(x_1 \lor x_2) \land (\lnot x_1 \lor x_3)$, the grammar is
$$
\begin{align}
&S \to 00B \mid 1B0 \\ &B \to 0 \mid 1
\end{align}
$$
This grammar generates $2^n$ words iff $\varphi$ is unsatisfiable. Therefore, by querying the $2^n$'th word, we will be able to tell whether $\varphi$ is satisfiable or not, and so solve SAT.

Context

StackExchange Computer Science Q#116321, answer score: 5

Revisions (0)

No revisions yet.