Recursive Descent Parser
First heard in syllabus of Dabeaz (David Beazley) Write a Compiler Course Notebook Also see ((How to Write a (Lisp) Interpreter (in Python)) (by Peter Norvig)(archived))
A recursive descent parser is a top-down parsing technique where each non-terminal in a grammar corresponds to a function in the parser. The parser starts at the top-level rule (usually the start symbol) and recursively calls functions for each sub-rule, consuming tokens from the input when they match expected patterns. - chatgpt
Uses a set of mutually recursive functions to process the input.
Limitations:
- left recursion
- Direct left recursion (A → Aα | β) causes infinite recursion and must be eliminated via transformation.
- Indirect left recursion (A → Bα, B → Aβ) also needs resolution.
Dabeaz implementation from Python Cookbook
David Beazley (Dabeaz) code in Python Cookbook:
You should start by having a formal specification of the grammar in the form of BNF (Backus-Naur Form (BNF)) or EBNF (Extended Backus-Naur Form (EBNF)). The grammar of algebraic expressions:
expr ::= expr + term
| expr - term
| term
term ::= term * factor
| term / factor
| factor
factor ::= ( expr )
| NUM
import re
import collections
# Token specification
NUM = r'(?P<NUM>\d+)'
PLUS = r'(?P<PLUS>\+)'
MINUS = r'(?P<MINUS>-)'
TIMES = r'(?P<TIMES>\*)'
DIVIDE = r'(?P<DIVIDE>/)'
LPAREN = r'(?P<LPAREN>\()'
RPAREN = r'(?P<RPAREN>\))'
WS = r'(?P<WS>\s+)'
master_pat = re.compile('|'.join([NUM, PLUS, MINUS, TIMES,
DIVIDE, LPAREN, RPAREN, WS]))
# Tokenizer
Token = collections.namedtuple('Token', ['type','value'])
def generate_tokens(text):
scanner = master_pat.scanner(text)
for m in iter(scanner.match, None):
tok = Token(m.lastgroup, m.group())
if tok.type != 'WS':
yield tok
# Parser
class ExpressionEvaluator:
'''
Implementation of a recursive descent parser. Each method
implements a single grammar rule. Use the ._accept() method
to test and accept the current lookahead token. Use the ._expect()
method to exactly match and discard the next token on on the input
(or raise a SyntaxError if it doesn't match).
'''
def parse(self,text):
self.tokens = generate_tokens(text)
self.tok = None # Last symbol consumed
self.nexttok = None # Next symbol tokenized
self._advance() # Load first lookahead token
return self.expr()
def _advance(self):
'Advance one token ahead'
self.tok, self.nexttok = self.nexttok, next(self.tokens, None)
def _accept(self,toktype):
'Test and consume the next token if it matches toktype'
if self.nexttok and self.nexttok.type == toktype:
self._advance()
return True
else:
return False
def _expect(self,toktype):
'Consume next token if it matches toktype or raise SyntaxError'
if not self._accept(toktype):
raise SyntaxError('Expected ' + toktype)
# Grammar rules follow
def expr(self):
"expression ::= term { ('+'|'-') term }*"
exprval = self.term()
while self._accept('PLUS') or self._accept('MINUS'):
op = self.tok.type
right = self.term()
if op == 'PLUS':
exprval += right
elif op == 'MINUS':
exprval -= right
return exprval
def term(self):
"term ::= factor { ('*'|'/') factor }*"
termval = self.factor()
while self._accept('TIMES') or self._accept('DIVIDE'):
op = self.tok.type
right = self.factor()
if op == 'TIMES':
termval *= right
elif op == 'DIVIDE':
termval /= right
return termval
def factor(self):
"factor ::= NUM | ( expr )"
if self._accept('NUM'):
return int(self.tok.value)
elif self._accept('LPAREN'):
exprval = self.expr()
self._expect('RPAREN')
return exprval
else:
raise SyntaxError('Expected NUMBER or LPAREN')
How it works on terminal: ```bash
e = ExpressionTreeBuilder() e.parse('2 + 3') ('+', 2, 3) e.parse('2 + 3 * 4') ('+', 2, ('', 3, 4)) e.parse('2 + (3 + 4) * 5') ('+', 2, ('', ('+', 3, 4), 5)) e.parse('2 + 3 + 4') ('+', ('+', 2, 3), 4)
Nevertheless, the overall idea of writing a recursive descent parser is generally simple. To start, you take every grammar rule and you turn it into a function or method. Thus, if your grammar looks like this:
expr ::= term { ('+'|'-') term }*
term ::= factor { ('*'|'/') factor }*
factor ::= '(' expr ')'
| NUM
You start by turning it into a set of methods like this:
python
class ExpressionEvaluator:
...
def expr(self):
...
def term(self):
...
def factor(self):
...
Key Characteristics
-
Top-Down Parsing
- It begins at the start symbol and expands productions recursively.
- Works best for LL(k) grammars (left-to-right parsing, leftmost derivation, k-token lookahead).
-
Backtracking (if naive)
- If a rule fails, the parser may revert to an earlier state and try another rule.
- Simple implementations suffer from exponential backtracking, making them impractical for ambiguous grammars.
-
Handling of Left Recursion
- Direct left recursion (A → Aα | β) causes infinite recursion and must be eliminated via transformation.
- Indirect left recursion (A → Bα, B → Aβ) also needs resolution.
-
Lookahead (LL(k))
- Typically uses one-token lookahead (LL(1)) but can be extended to LL(k) for more complex grammars.
- With predictive parsing, backtracking is avoided if the lookahead uniquely determines the next rule.
-
Mutual Recursion Between Functions
- Each grammar rule is implemented as a recursive function.
- Sub-rules are handled by calling appropriate functions.
-
Efficiency Considerations
- Left factoring is needed if multiple rules share common prefixes.
- Packrat parsing (memoization) can be applied to eliminate redundant computations.
- A Map of the Territory · Crafting Interpreters (archived) — First, let me establish a shorthand. Much of this book is about a language’s *implementation*, which is distinct from...
- Compilers Principles Techniques & Tools by Aho, Lam, Sethi, Ullman — - top-down ([[Recursive Descent Parser]], [[LL parser]])
- Compilers Principles Techniques & Tools by Aho, Lam, Sethi, Ullman (dragon book) — - top-down ([[Recursive Descent Parser]], [[LL parser]])
- Crafting Interpreters by Nystrom — - "This is a book that, if you can afford it, should be bought on general principle. There are very few books that sh...
- Dabeaz (David Beazley) Write a Compiler Course Notebook — His [[Recursive Descent Parser]] can be interesting to look
- David Beazley - Reinventing the Parser Generator - PyCon 2018 (archived) — Related to [[Dabeaz (David Beazley) Write a Compiler Course Notebook]] and [[Recursive Descent Parser]], which is a m...
- LL, LR, SLR, and LALR parsers — - [[Recursive Descent Parser]], [[LL(k) parsers]]
- Top-down and Bottom-up parsing — - typically implemented using [[Recursive Descent Parser]] or [[LL parser]]
- Write a Compiler (archived) — 2. **Parsing**. You need to parse programs by converting them from text to the data model. This involves tokenizing t...