18. Semantic AnalysisSymbol Tables & Scoping

Semantic Analysis: Symbol Tables & Scoping

The parser guarantees that the code is syntactically valid (e.g., x = y + 5;), but it knows nothing about what x and y actually mean. Name Resolution is the process of binding these raw identifier strings to their actual semantic declarations. This is managed by the Symbol Table.

The Stack of Hash Maps

A common architecture for a symbol table is not a single global dictionary, but a Stack of Hash Maps. This structure naturally models lexical scoping.

  1. Global Scope: At the bottom of the stack is the global hash map, containing built-in types (int, float), standard library functions (print), and top-level declarations.
  2. Pushing Scope: Whenever Sema enters a new lexical block (like a function body or an if statement), it pushes a new, empty hash map onto the top of the stack.
  3. Popping Scope: When Sema exits the block, it pops the hash map off the stack, instantly discarding all local variables declared within that block.

Resolving Identifiers

When Sema encounters an identifier usage (e.g., evaluating y in x = y + 5;), it must resolve it:

  1. It searches the hash map at the top of the stack (the most local scope).
  2. If the symbol is found, the search terminates.
  3. If the symbol is not found, Sema traverses down the stack, checking each parent scope.
  4. If it reaches the bottom of the stack (global scope) and still cannot find the symbol, it throws a compile-time error: Undefined reference to 'y'.

Shadowing

This top-down traversal naturally supports variable shadowing without any complex edge cases. If a global variable x exists, and a local block declares a new x, the local x is pushed onto the top hash map. Any usage of x inside that block will hit the local definition first and terminate the search, successfully “shadowing” the global variable.

Forward Declarations

In modern languages, a function should be able to call another function declared later in the file. To support this, Sema actually breaks Pass 1 into two sub-passes:

  • Pass 1A (Signatures): Skips function bodies. It only scans top-level declarations and populates the global scope with function names and signatures.
  • Pass 1B (Bodies): Walks into the function bodies and resolves local variables. Because Pass 1A already registered all global signatures, a function can safely call another function defined beneath it without throwing an undefined reference error.

Next Module: Type Checker