PyPy Parser
Overview
The PyPy parser includes a tokenizer and a recursive descent parser.
Tokenizer
At the moment, the tokenizer is implemented as a single function(generate_tokens
in pypy/interpreter/pyparser/pytokenizer.py) that buildsa list of tokens. The tokens are then fed to the parser.
Parser
The parser is a simple LL(1) parser that is similar to CPython’s.
Building the Python grammar
The python grammar is built at startup from the pristine CPython grammar file(see pypy/interpreter/pyparser/metaparser.py). The grammar builder firstrepresents the grammar as rules corresponding to a set of NondeterministicFinite Automatons (NFAs). It then converts them to a set of DeterministicFinite Automatons (DFAs). The difference between a NFA and a DFA is that a NFAmay have several possible next states for any given input while a DFA may onlyhave one. DFAs are therefore more limiting, but far more efficient to use inparsing. Finally, the assigns the grammar builder assigns each DFA state anumber and packs them into a list for the parser to use. The final product isan instance of the Grammar
class in pypy/interpreter/pyparser/parser.py.
Parser implementation
The workhorse of the parser is the add_token
method of the Parser
class.It tries to find a transition from the current state to another state based onthe token it receives as a argument. If it can’t find a transition, it checksif the current state is accepting. If it’s not, a ParseError
israised. When parsing is done without error, the parser has built a tree ofNode
.
Parsing Python
The glue code between the tokenizer and the parser as well as extra Pythonspecific code is in pypy/interpreter/pyparser/pyparse.py. Theparsesource
method takes a string of Python code and returns the parsetree. It also detects the coding cookie if there is one and decodes the source.Note that _future imports are handled before the parser is invoked bymanually parsing the source in pypy/interpreter/pyparser/future.py.
Compiler
The next step in generating Python bytecode is converting the parse tree into anAbstract Syntax Tree (AST).
Building AST
Python’s AST is described in pypy/interpreter/astcompiler/tools/Python.asdl.From this definition, pypy/interpreter/astcompiler/tools/asdl_py.py generatespypy/interpreter/astcompiler/ast.py, which RPython classes for the compileras well as bindings to application level code for the AST. Some customextensions to the AST classes are inpypy/interpreter/astcompiler/asthelpers.py.
pypy/interpreter/astcompiler/astbuilder.py is responsible for convertingparse trees into AST. It walks down the parse tree building nodes as it goes.The result is a toplevel mod
node.
AST Optimization
pypy/interpreter/astcompiler/optimize.py contains the AST optimizer. It doesconstant folding of expressions, and other simple transformations like making aload of the name “None” into a constant.
Symbol analysis
Before writing bytecode, a symbol table is built inpypy/interpreter/astcompiler/symtable.py. It determines if every name in thesource is local, implicitly global (no global declaration), explicitly global(there’s a global declaration of the name in the scope), a cell (the name inused in nested scopes), or free (it’s used in a nested function).
Bytecode generation
Bytecode is emitted in pypy/interpreter/astcompiler/codegen.py. Eachbytecode is represented temporarily by the Instruction
class inpypy/interpreter/astcompiler/assemble.py. After all bytecodes have beenemitted, it’s time to build the code object. Jump offsets and bytecodeinformation like the line number table and stack depth are computed. Finally,everything is passed to a brand new PyCode
object.