Recursive Descent Parser
programming compilers programmingconcept
First heard in syllabus of Dabeaz (David Beazley) Write a Compiler Course Notebook
Also see ((How to Write a (Lisp) Interpreter (in Python)) (by Peter Norvig)(archived))
A recursive descent parser is a top-down parsing technique where each non-terminal in a grammar corresponds to a function in the parser. The parser starts at the top-level rule (usually the start symbol) and recursively calls functions for each sub-rule, consuming tokens from the input when they match expected patterns.
Uses a set of mutually recursive functions to process the input.
Limitations:
left recursion
Direct left recursion (A → Aα | β) causes infinite recursion and must be eliminated via transformation.
Indirect left recursion (A → Bα, B → Aβ) also needs resolution.
Dabeaz implementation from Python Cookbook
David Beazley (Dabeaz) code in Python Cookbook:
You should start by having a formal specification of the grammar in the form of BNF (Backus-Naur Form (BNF)) or EBNF (Extended Backus-Naur Form (EBNF)). The grammar of algebraic expressions:
expr ::= expr + term
| expr - term
| term
term ::= term * factor
| term / factor
| factor
factor ::= ( expr )
| NUM
import re
import collections
# Token specification
NUM = r'(?P<NUM>\d+)'
PLUS = r'(?P<PLUS>\+)'
MINUS = r'(?P<MINUS>-)'
TIMES = r'(?P<TIMES>\*)'
DIVIDE = r'(?P<DIVIDE>/)'
LPAREN = r'(?P<LPAREN>\()'
RPAREN = r'(?P<RPAREN>\))'
WS = r'(?P<WS>\s+)'
master_pat = re.compile('|'.join([NUM, PLUS, MINUS, TIMES,
DIVIDE, LPAREN, RPAREN, WS]))
# Tokenizer
Token = collections.namedtuple('Token', ['type','value'])
def generate_tokens(text):
scanner = master_pat.scanner(text)
for m in iter(scanner.match, None):
tok = Token(m.lastgroup, m.group())
if tok.type != 'WS':
yield tok
# Parser
class ExpressionEvaluator:
'''
Implementation of a recursive descent parser. Each method
implements a single grammar rule. Use the ._accept() method
to test and accept the current lookahead token. Use the ._expect()
method to exactly match and discard the next token on on the input
(or raise a SyntaxError if it doesn't match).
'''
def parse(self,text):
self.tokens = generate_tokens(text)
self.tok = None # Last symbol consumed
self.nexttok = None # Next symbol tokenized
self._advance() # Load first lookahead token
return self.expr()
def _advance(self):
'Advance one token ahead'
self.tok, self.nexttok = self.nexttok, next(self.tokens, None)
def _accept(self,toktype):
'Test and consume the next token if it matches toktype'
if self.nexttok and self.nexttok.type == toktype:
self._advance()
return True
else:
return False
def _expect(self,toktype):
'Consume next token if it matches toktype or raise SyntaxError'
if not self._accept(toktype):
raise SyntaxError('Expected ' + toktype)
# Grammar rules follow
def expr(self):
"expression ::= term { ('+'|'-') term }*"
exprval = self.term()
while self._accept('PLUS') or self._accept('MINUS'):
op = self.tok.type
right = self.term()
if op == 'PLUS':
exprval += right
elif op == 'MINUS':
exprval -= right
return exprval
def term(self):
"term ::= factor { ('*'|'/') factor }*"
termval = self.factor()
while self._accept('TIMES') or self._accept('DIVIDE'):
op = self.tok.type
right = self.factor()
if op == 'TIMES':
termval *= right
elif op == 'DIVIDE':
termval /= right
return termval
def factor(self):
"factor ::= NUM | ( expr )"
if self._accept('NUM'):
return int(self.tok.value)
elif self._accept('LPAREN'):
exprval = self.expr()
self._expect('RPAREN')
return exprval
else:
raise SyntaxError('Expected NUMBER or LPAREN')
How it works on terminal:
>>> e = ExpressionTreeBuilder()
>>> e.parse('2 + 3')
('+', 2, 3)
>>> e.parse('2 + 3 * 4')
('+', 2, ('*', 3, 4))
>>> e.parse('2 + (3 + 4) * 5')
('+', 2, ('*', ('+', 3, 4), 5))
>>> e.parse('2 + 3 + 4')
('+', ('+', 2, 3), 4)
>>>
Nevertheless, the overall idea of writing a recursive descent parser is generally simple.
To start, you take every grammar rule and you turn it into a function or method. Thus,
if your grammar looks like this:
expr ::= term { ('+'|'-') term }*<br>
term ::= factor { ('*'|'/') factor }*<br>
factor ::= '(' expr ')'<br>
| NUM<br>
You start by turning it into a set of methods like this:
class ExpressionEvaluator:
...
def expr(self):
...
def term(self):
...
def factor(self):
...
Key Characteristics
Top-Down Parsing
It begins at the start symbol and expands productions recursively.
Works best for LL(k) grammars (left-to-right parsing, leftmost derivation, k-token lookahead).
Backtracking (if naive)
If a rule fails, the parser may revert to an earlier state and try another rule.
Simple implementations suffer from exponential backtracking, making them impractical for ambiguous grammars.
Handling of Left Recursion
Direct left recursion (A → Aα | β) causes infinite recursion and must be eliminated via transformation.
Indirect left recursion (A → Bα, B → Aβ) also needs resolution.
Lookahead (LL(k))
Typically uses one-token lookahead (LL(1)) but can be extended to LL(k) for more complex grammars.
With predictive parsing, backtracking is avoided if the lookahead uniquely determines the next rule.
Mutual Recursion Between Functions
Each grammar rule is implemented as a recursive function.
Sub-rules are handled by calling appropriate functions.
Efficiency Considerations
Left factoring is needed if multiple rules share common prefixes.
Packrat parsing (memoization) can be applied to eliminate redundant computations.
Incoming Internal References (2)
-
A Map of the Territory · Crafting Interpreters (archived)
First, let me establish a shorthand. Much of this book is about a language’s *implementation*, which is distinct from the *language itself* in some sort of Platonic ideal form. Things like “stack”, “bytecode”, and “[[Recursive Descent Parser|recursive descent]]”, are nuts and bolts one particular implementation might use. From the user’s perspective, as long as the resulting contraption faithfully follows the language’s specification, it’s all implementation detail.
-
Crafting Interpreters by Nystrom
https://news.ycombinator.com/item?id=30332368
- "This is a book that, if you can afford it, should be bought on general principle. There are very few books that show how [[Recursive Descent Parser|recursive descent]] works and this one is incredibly well done. The fact he gives it away for free is amazing."
Outgoing Internal References (7)
-
---
First heard in syllabus of [[Dabeaz (David Beazley) Write a Compiler Course Notebook]]
Also see [[((How to Write a (Lisp) Interpreter (in Python)) (by Peter Norvig)(archived))]] -
First heard in syllabus of [[Dabeaz (David Beazley) Write a Compiler Course Notebook]]
Also see [[((How to Write a (Lisp) Interpreter (in Python)) (by Peter Norvig)(archived))]]
-
Uses a set of [[mutually recursive functions]] to process the input.
-
### Dabeaz implementation from Python Cookbook
[[David Beazley (Dabeaz)]] code in [[Python Cookbook]]:
-
### Dabeaz implementation from Python Cookbook
[[David Beazley (Dabeaz)]] code in [[Python Cookbook]]:
-
You should start by having a formal specification of the grammar in the form of BNF ([[Backus-Naur Form (BNF)]]) or EBNF ([[Extended Backus-Naur Form (EBNF)]]). The grammar of algebraic expressions:
``` -
You should start by having a formal specification of the grammar in the form of BNF ([[Backus-Naur Form (BNF)]]) or EBNF ([[Extended Backus-Naur Form (EBNF)]]). The grammar of algebraic expressions:
```