12. Turing Machines and Decidability

The highly restricted Regular Languages (Type 3) and the more flexible Context-Free Languages (Type 2) form the foundation of parsing. To complete the Chomsky Hierarchy, we now examine the least restricted classes: Context-Sensitive Languages (Type 1) and Recursively Enumerable Languages (Type 0).

12.1 Type 1: Context-Sensitive Grammars (CSG)

A Context-Sensitive Grammar generates a Context-Sensitive Language. Its production rules depend on the context of the symbols. A non-terminal can only be replaced when surrounded by specific strings.

Formal Rule

Productions must be of the form: α X β → α γ β

X is a non-terminal.
α and β are strings of terminals/non-terminals (they form the context).
γ is a non-empty string.
(Exception: $S \to \epsilon$ is allowed if $S$ never appears on the right-hand side of any rule).

Linear Bounded Automaton (LBA)

Type 1 languages are recognized by a Linear Bounded Automaton (LBA). An LBA is essentially a non-deterministic Turing Machine, but with a restriction: the length of the working tape is strictly bounded by a linear function of the length of the input string. It cannot use infinite memory, only memory proportional to the input size.

12.2 Type 0: Unrestricted Grammars

At the top of the Chomsky Hierarchy are Unrestricted Grammars (or recursively enumerable grammars). There are no restrictions on the production rules other than the left-hand side must contain at least one non-terminal.

α → β (where $\alpha$ contains at least one non-terminal, and $\beta$ can be any string).

These grammars generate Recursively Enumerable Languages, which are processed by the Turing Machine.

12.3 Turing Machines (TM)

A Turing Machine (TM) is an abstract computational model invented by Alan Turing. It consists of a finite state machine, an infinitely long tape divided into cells, and a read/write head that can move left or right across the tape.

Unlike PDAs (which have a LIFO stack) or LBAs (which have bounded memory), a Turing Machine has unbounded memory and unrestricted access to that memory. It can simulate any computer algorithm; the Church-Turing thesis states that anything computable can be computed by a Turing Machine.

Formal Definition

A Turing Machine is formally defined as a 7-tuple: M = (Q, Σ, Γ, δ, q₀, B, F)

Q: A finite set of states.
Σ: The input alphabet (a subset of $\Gamma$, excluding $B$).
Γ (Gamma): The tape alphabet.
δ: The transition function mapping Q × Γ → Q × Γ × {L, R}. (Given a state and a tape symbol, it returns a new state, a symbol to write, and a direction to move the head: Left or Right).
q₀: The initial state ($q₀ \in Q$).
B: The blank symbol (the infinite tape is initialized with blanks).
F: The set of final (accepting) states ($F \subseteq Q$).

Determinism

Unlike Pushdown Automata, adding non-determinism to a Turing Machine does not increase its expressive power. Every Non-Deterministic Turing Machine (NTM) can be simulated by a Deterministic Turing Machine (DTM). They recognize the exact same class of languages.

12.4 Decidability and the Halting Problem

Because Turing Machines are so powerful, they bring us to a foundational question of computer science: Can every problem be solved algorithmically?

The answer is no. Problems are categorized based on their solvability. Note that these are not mutually exclusive buckets; rather, Decidable $\subset$ Semi-Decidable $\subset$ All Languages, and “Undecidable” simply means “not decidable” (the complement of the first set):

Decidable Problems: An algorithm exists that always halts and correctly answers “Yes” or “No” for every input. (e.g., “Is string $w$ accepted by DFA $M$?”)
Semi-Decidable (Turing-Recognizable): An algorithm exists that will halt and output “Yes” if the answer is yes, but might loop infinitely if the answer is “No”.
Undecidable Problems: No algorithm exists that can correctly solve the problem (halt and answer definitively) for all possible inputs.

The Halting Problem

The most famous undecidable problem is the Halting Problem. (Note: The Halting Problem is both Semi-Decidable and Undecidable — it can recognize a “Yes” by simulating the program until it halts, but cannot definitively answer “No”, proving these categories aren’t mutually exclusive).

Problem Statement: Is it possible to write a universal program H(P, I) that takes the source code of any program P and an input I, and perfectly predicts whether P will halt (finish execution) or run forever in an infinite loop?

Proof of Undecidability (by Contradiction): Assume such a machine H exists. We then build an adversary wrapper program, E(X):

function E(X):
    if H(X, X) == "halts":
        loop_forever()
    else:
        halt()

What happens if we feed E its own source code? We call E(E):

If H predicts that E will halt, E intentionally loops forever (contradicting H).
If H predicts that E will loop forever, E immediately halts (contradicting H).

In both cases, H is wrong. The contradiction proves that the Halting Problem is undecidable. No general algorithm can analyze arbitrary code to guarantee it will eventually terminate.

This concludes the theoretical foundations of Formal Languages and Automata. The constraints of these mathematical machines dictate how we build the practical components of a compiler.

Next: Lexer Architecture: The Sentinel Byte

11. Pushdown Automata 1. The Sentinel Byte