patternpythonMinor
PEG parser in Python
Viewed 0 times
parserpegpython
Problem
Any suggestions to make this code clearer, more Pythonic, or otherwise better? I'm open to changes to the design as well as the code (but probably won't drop features or error checking since everything still in it has shown worth to me).
```
"""
Parsing with PEGs, or a minimal usable subset thereof.
Background at http://bford.info/packrat/
"""
import re
def _memo(f):
"""Return a function like f but caching its results. Its arguments
must be hashable."""
memos = {}
def memoized(*args):
try: return memos[args]
except KeyError:
result = memos[args] = f(*args)
return result
return memoized
_identifier = r'[A-Za-z_]\w*'
def Parser(grammar, **actions):
r"""Make a parsing function from a PEG grammar. You supply the
grammar as a string of rules like "a = b c | d". All the tokens
making up the rules must be whitespace-separated. Each token
(besides '=' and '|') is a regex, a rule name, or an action
name. (Possibly preceded by '!' for negation: !foo successfully
parses when foo fails to parse.)
A regex token is either // or any non-identifier; an
identifier that's not a defined rule or action name is an
error. (So, an incomplete grammar gets you a BadGrammar exception
instead of a wrong parse.)
Results get added by regex captures and transformed by actions.
(Use keyword arguments to bind the action names to functions.)
The parsing function maps a string to a results tuple or raises
Unparsable. (It can optionally take a rule name to start from, by
default the first in the grammar.) It doesn't necessarily match
the whole input, just a prefix.
>>> parse_s_expression = Parser(r'''
... one_expr = _ expr !.
... _ = \s*
... expr = \( _ exprs \) _ hug
... | ([^()\s]+) _
... exprs = expr exprs
... | ''', hug = lambda *vals: vals)
>>> parse_s_expression(' (hi (john mccarthy) (
```
"""
Parsing with PEGs, or a minimal usable subset thereof.
Background at http://bford.info/packrat/
"""
import re
def _memo(f):
"""Return a function like f but caching its results. Its arguments
must be hashable."""
memos = {}
def memoized(*args):
try: return memos[args]
except KeyError:
result = memos[args] = f(*args)
return result
return memoized
_identifier = r'[A-Za-z_]\w*'
def Parser(grammar, **actions):
r"""Make a parsing function from a PEG grammar. You supply the
grammar as a string of rules like "a = b c | d". All the tokens
making up the rules must be whitespace-separated. Each token
(besides '=' and '|') is a regex, a rule name, or an action
name. (Possibly preceded by '!' for negation: !foo successfully
parses when foo fails to parse.)
A regex token is either // or any non-identifier; an
identifier that's not a defined rule or action name is an
error. (So, an incomplete grammar gets you a BadGrammar exception
instead of a wrong parse.)
Results get added by regex captures and transformed by actions.
(Use keyword arguments to bind the action names to functions.)
The parsing function maps a string to a results tuple or raises
Unparsable. (It can optionally take a rule name to start from, by
default the first in the grammar.) It doesn't necessarily match
the whole input, just a prefix.
>>> parse_s_expression = Parser(r'''
... one_expr = _ expr !.
... _ = \s*
... expr = \( _ exprs \) _ hug
... | ([^()\s]+) _
... exprs = expr exprs
... | ''', hug = lambda *vals: vals)
>>> parse_s_expression(' (hi (john mccarthy) (
Solution
Just last week, I was wondering what you were up to lately. :-)
I've been working on nice PEG parsing for Python lately. I've written an adaptation of OMeta, named Parsley. So far I haven't implemented regex tokens.
I wrote a custom
Otherwise it looks reasonable for your goals. I ended up wrapping memoization and location tracking into an
I've been working on nice PEG parsing for Python lately. I've written an adaptation of OMeta, named Parsley. So far I haven't implemented regex tokens.
I wrote a custom
ParseError exception class. I definitely think that the builtin SyntaxError should only be used for actual-Python syntax errors.Parser reads a bit densely to me, both because of the regexes and because of the lack of newlines after colons.Otherwise it looks reasonable for your goals. I ended up wrapping memoization and location tracking into an
InputStream object rather than passing indices around directly; this was partly because I wanted to be able to apply Parsley rules to iterables other than strings.Context
StackExchange Code Review Q#19034, answer score: 4
Revisions (0)
No revisions yet.