16. ParserPratt Parsing (Expressions)

18. Parser: Pratt Parsing

While Recursive Descent is highly effective for statements (the top-level control flow), it struggles when parsing expressions. Standard recursive descent handles operator precedence by nesting functions (parse_equality() calls parse_addition(), which calls parse_multiplication()). This creates a massive, slow call stack.

To solve this, the parser yields to a Top-Down Operator Precedence architecture, commonly known as a Pratt Parser, whenever it encounters an expression.

The Token-Driven Architecture

Instead of defining expressions via deep grammar rules, Pratt parsing assigns parsing logic directly to the tokens themselves. Every operator token is assigned three properties:

  1. Binding Power (Precedence): An integer dictating how tightly the operator binds to its operands (e.g., * has a higher binding power than +).
  2. NUD (Null Denotation): A handler for when a token appears at the start of an expression. This handles literals, variables, and prefix operators like unary minus (-a).
  3. LED (Left Denotation): A handler for when a token appears in the middle of an expression. This handles infix operators (a + b). The LED handler receives the previously parsed left-hand expression as an argument.

The Core Loop

The entirety of expression parsing is condensed into a single loop. The function parse_expression(current_binding_power) operates as follows:

Expression parse_expression(int binding_power) {
    Token token = consume();
    Expression left = token.nud();
 
    while (binding_power < peek().binding_power) {
        token = consume();
        left = token.led(left);
    }
    return left;
}

Parsing Mechanics: 1 + 2 * 3

To understand how this loop resolves precedence without nested functions, trace the execution of 1 + 2 * 3, assuming + has a binding power of 10 and * has a binding power of 20:

  1. parse_expression(0) is called.
  2. It consumes 1. The NUD handler for integers simply returns an AST node: [1].
  3. The loop checks the next token: +. Its binding power (10) is greater than the current context (0), so the loop executes.
  4. It consumes + and calls its LED handler, passing [1] as the left argument.
  5. Inside the + LED handler, it needs the right-hand operand, so it calls parse_expression(10).
  6. This inner call consumes 2. Its NUD returns [2].
  7. The inner loop checks the next token: *. Its binding power (20) is greater than the current context (10).
  8. It consumes * and calls its LED handler, passing [2] as the left argument.
  9. Inside the * LED handler, it calls parse_expression(20). This consumes 3, returning [3].
  10. The * handler constructs the node [2 * 3] and returns it to the inner loop.
  11. The inner loop sees the end of the expression, terminates, and returns [2 * 3] to the + handler.
  12. The + handler constructs [1 + [2 * 3]] and returns it to the outer loop, which also terminates.

This $O(n)$ loop dynamically resolves all precedence and associativity rules based purely on token binding power, returning a perfectly structured AST node back to the recursive descent statement that requested it.


Next Module: Abstract Syntax Tree