HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Correct name for a recursive descent parser that uses loops to handle left recursion?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
leftrecursionhandleparserloopsusesrecursivenameforcorrect

Problem

This grammar is left recursive:

Expression  ::= AdditionExpression

AdditionExpression  ::=
    MultiplicationExpression
        | AdditionExpression '+' MultiplicationExpression
        | AdditionExpression '-' MultiplicationExpression

MultiplicationExpression    ::=
    Term
        | MultiplicationExpression '*' Term
        | MultiplicationExpression '/' Term

Term    ::=
    Number
        | '(' AdditionExpression ')'

Number  ::=
    [+-]?[0-9]+(\.[0-9]+)?


So in theory, recursive descent won't work. But by exploiting the properties of the grammar that each left-recursive rule corresponds to a specific precedence level, and that lookahead of a single token is enough to choose the correct production, the left-recursive rules can be individually parsed with while loops.

For example, to parse the AdditionExpression non-terminal, this pseudocode suffices:

function parse_addition_expression() {
    num = parse_multiplication_expression()
    while (has_token()) {
        get_token()
        if (current_token == PLUS)
            num += parse_multiplication_expression()
        else if (current_token == MINUS)
            num -= parse_multiplication_expression()
        else {
            unget_token()
            return num
        }
    }
    return num
}


What is the correct name for this type of parser? This informative article only refers to it as the "Classic Solution": https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm

There must be a proper name for this type of parser.

Solution

It is just an LL(1) parser implemented with recursive descent.

Starts with:

AdditionExpression  ::=
    MultiplicationExpression
        | AdditionExpression '+' MultiplicationExpression
        | AdditionExpression '-' MultiplicationExpression


apply left-recursion removal to get an LL(1) grammar:

AdditionExpression  ::= 
    MultiplicationExpression AdditionExpressionTail

AdditionExpressionTail ::=
        | '+' MultiplicationExpression AdditionExpressionTail
        | '-' MultiplicationExpression AdditionExpressionTail


write the corresponding functions:

function parse_AdditionExpression() {
    parse_MultiplicationExpression()
    parse_AdditionExpressionTail()
}

function parse_AdditionExpressionTail() {
    if (has_token()) {
        get_token()
        if (current_token == PLUS) {
            parse_MultiplicationExpression()
            parse_AdditionExpressionTail()
        } else if (current_token == MINUS) {
            parse_MultiplicationExpression()
            parse_AdditionExpressionTail()
        } else {
            unget_token()
        }
    }
}


remove tail recursion:

function parse_AdditionExpressionTail() {
    while (has_token()) {
        get_token()
        if (current_token == PLUS)
            parse_MultiplicationExpression()
        else if (current_token == MINUS)
            parse_MultiplicationExpression()
        else {
            unget_token()
            return
        }
    }
}


inline:

function parse_AdditionExpression() {
    parse_MultiplicationExpression()
    while (has_token()) {
        get_token()
        if (current_token == PLUS)
            parse_MultiplicationExpression()
        else if (current_token == MINUS)
            parse_MultiplicationExpression()
        else {
            unget_token()
            return
        }
    }
}


and you have just to add the semantic processing to get your function.

Code Snippets

AdditionExpression  ::=
    MultiplicationExpression
        | AdditionExpression '+' MultiplicationExpression
        | AdditionExpression '-' MultiplicationExpression
AdditionExpression  ::= 
    MultiplicationExpression AdditionExpressionTail

AdditionExpressionTail ::=
        | '+' MultiplicationExpression AdditionExpressionTail
        | '-' MultiplicationExpression AdditionExpressionTail
function parse_AdditionExpression() {
    parse_MultiplicationExpression()
    parse_AdditionExpressionTail()
}

function parse_AdditionExpressionTail() {
    if (has_token()) {
        get_token()
        if (current_token == PLUS) {
            parse_MultiplicationExpression()
            parse_AdditionExpressionTail()
        } else if (current_token == MINUS) {
            parse_MultiplicationExpression()
            parse_AdditionExpressionTail()
        } else {
            unget_token()
        }
    }
}
function parse_AdditionExpressionTail() {
    while (has_token()) {
        get_token()
        if (current_token == PLUS)
            parse_MultiplicationExpression()
        else if (current_token == MINUS)
            parse_MultiplicationExpression()
        else {
            unget_token()
            return
        }
    }
}
function parse_AdditionExpression() {
    parse_MultiplicationExpression()
    while (has_token()) {
        get_token()
        if (current_token == PLUS)
            parse_MultiplicationExpression()
        else if (current_token == MINUS)
            parse_MultiplicationExpression()
        else {
            unget_token()
            return
        }
    }
}

Context

StackExchange Computer Science Q#74460, answer score: 11

Revisions (0)

No revisions yet.