16. ParserRecursive Descent & Panic Mode

17. Parser: Recursive Descent & Panic Mode

The foundation of the Syntax Analyzer is a hand-written, predictive Recursive Descent parser. While parser generators (like Bison or Yacc) or Parsing Expression Grammars (PEGs) can automate grammar rules, a hand-written LL(1) approach is a common choice for production compilers due to its strong error recovery capabilities.

The Parsing Mechanism

Recursive descent works by mapping each non-terminal in a formal grammar directly to a function in the host language. The parser maintains a pointer to the current token in the input stream and “descends” through the grammar by having these functions call one another.

Consider a simplified grammar rule for an if statement: IfStatement -> "if" Expression "{" Statement* "}"

The corresponding parser function physically mirrors this rule:

ASTNode* parse_if_statement() {
    // 1. Consume the expected starting token
    consume(TOKEN_IF);
    
    // 2. Delegate to other parsing functions for sub-rules
    ASTNode* condition = parse_expression();
    
    consume(TOKEN_LBRACE);
    
    // 3. Handle repetition (the '*' in the grammar)
    List<ASTNode*> body;
    while (!check(TOKEN_RBRACE) && !is_at_end()) {
        body.append(parse_statement());
    }
    
    consume(TOKEN_RBRACE);
    
    // 4. Construct and return the AST node
    return create_if_node(condition, body);
}

Because the grammar is predictive (LL(1)), the parser only ever needs to look exactly one token ahead (using a peek() or check() function) to determine which function to branch into. There is no backtracking; once the parser commits to an IfStatement, it expects the subsequent tokens to follow that exact structure.

Panic Mode Error Recovery

Generated parsers and PEGs historically struggle with failure states: when an unexpected token is encountered, they can produce cascading errors due to backtracking.

A hand-written recursive descent parser solves this using Panic Mode Synchronization:

  1. Detection: An unexpected token triggers a syntax error (e.g., consume(TOKEN_RBRACE) fails because it found a ;). Instead of crashing, the parser reports the error and enters “panic mode”.
  2. Discarding: The parser blindly consumes and discards subsequent tokens, suppressing any further error messages.
  3. Synchronization: The discarding stops only when the parser hits a known boundary token, typically a statement terminator like a NEWLINE or DEDENT.
  4. Resumption: The parser resets its internal state and resumes normal parsing.

This mechanism allows the compiler to isolate the malformed syntax and continue analyzing the rest of the file, providing the user with multiple accurate error diagnostics in a single compilation pass.


Next Module: Pratt Parsing (Expressions)