18. Parser: Pratt Parsing
While Recursive Descent is highly effective for statements (the top-level control flow), it struggles when parsing expressions. Standard recursive descent handles operator precedence by nesting functions (parse_equality() calls parse_addition(), which calls parse_multiplication()). This creates a massive, slow call stack.
To solve this, the parser yields to a Top-Down Operator Precedence architecture, commonly known as a Pratt Parser, whenever it encounters an expression.
The Token-Driven Architecture
Instead of defining expressions via deep grammar rules, Pratt parsing assigns parsing logic directly to the tokens themselves. Every operator token is assigned three properties:
- Binding Power (Precedence): An integer dictating how tightly the operator binds to its operands (e.g.,
*has a higher binding power than+). - NUD (Null Denotation): A handler for when a token appears at the start of an expression. This handles literals, variables, and prefix operators like unary minus (
-a). - LED (Left Denotation): A handler for when a token appears in the middle of an expression. This handles infix operators (
a + b). TheLEDhandler receives the previously parsed left-hand expression as an argument.
The Core Loop
The entirety of expression parsing is condensed into a single loop. The function parse_expression(current_binding_power) operates as follows:
Expression parse_expression(int binding_power) {
Token token = consume();
Expression left = token.nud();
while (binding_power < peek().binding_power) {
token = consume();
left = token.led(left);
}
return left;
}Parsing Mechanics: 1 + 2 * 3
To understand how this loop resolves precedence without nested functions, trace the execution of 1 + 2 * 3, assuming + has a binding power of 10 and * has a binding power of 20:
parse_expression(0)is called.- It consumes
1. TheNUDhandler for integers simply returns an AST node:[1]. - The loop checks the next token:
+. Its binding power (10) is greater than the current context (0), so the loop executes. - It consumes
+and calls itsLEDhandler, passing[1]as the left argument. - Inside the
+LEDhandler, it needs the right-hand operand, so it callsparse_expression(10). - This inner call consumes
2. ItsNUDreturns[2]. - The inner loop checks the next token:
*. Its binding power (20) is greater than the current context (10). - It consumes
*and calls itsLEDhandler, passing[2]as the left argument. - Inside the
*LEDhandler, it callsparse_expression(20). This consumes3, returning[3]. - The
*handler constructs the node[2 * 3]and returns it to the inner loop. - The inner loop sees the end of the expression, terminates, and returns
[2 * 3]to the+handler. - The
+handler constructs[1 + [2 * 3]]and returns it to the outer loop, which also terminates.
This $O(n)$ loop dynamically resolves all precedence and associativity rules based purely on token binding power, returning a perfectly structured AST node back to the recursive descent statement that requested it.