- Coding Guide
- Overview and motivation
- Wrapping rules
- Modules in PyPy
- Determining the location of a module implementation
- Module directories / Import order
- Modifying a CPython library module or regression test
- Implementing a mixed interpreter/application level Module
- application level definitions
- interpreter level definitions
- Testing modules in lib_pypy/
- Testing modules in pypy/module
- Testing modules in lib-python
- Naming conventions and directory layout
- Using the development bug/feature tracker
- Testing in PyPy
- Changing documentation and website
Coding Guide
This document describes coding requirements and conventions forworking with the PyPy code base. Please read it carefully andask back any questions you might have. The document does not talkvery much about coding style issues. We mostly follow PEP 8 though.If in doubt, follow the style that is already present in the code base.
Overview and motivation
We are writing a Python interpreter in Python, using Python’s well knownability to step behind the algorithmic problems as a language. At first glance,one might think this achieves nothing but a better understanding how theinterpreter works. This alone would make it worth doing, but we have muchlarger goals.
CPython vs. PyPy
Compared to the CPython implementation, Python takes the role of the CCode. We rewrite the CPython interpreter in Python itself. We couldalso aim at writing a more flexible interpreter at C level but wewant to use Python to give an alternative description of the interpreter.
The clear advantage is that such a description is shorter and simpler toread, and many implementation details vanish. The drawback of this approach isthat this interpreter will be unbearably slow as long as it is run on topof CPython.
To get to a useful interpreter again, we need to translate ourhigh-level description of Python to a lower level one. One ratherstraight-forward way is to do a whole program analysis of the PyPyinterpreter and create a C source, again. There are many other ways,but let’s stick with this somewhat canonical approach.
Application-level and interpreter-level execution and objects
Since Python is used for implementing all of our code base, there is acrucial distinction to be aware of: that between interpreter-level objects andapplication-level objects. The latter are the ones that you deal withwhen you write normal python programs. Interpreter-level code, however,cannot invoke operations nor access attributes from application-levelobjects. You will immediately recognize any interpreter level code inPyPy, because half the variable and object names start with a w_
, whichindicates that they are wrapped application-level values.
Let’s show the difference with a simple example. To sum the contents oftwo variables a
and b
, one would write the simple application-levela+b
– in contrast, the equivalent interpreter-level code isspace.add(w_a, w_b)
, where space
is an instance of an object space,and w_a
and w_b
are typical names for the wrapped versions of thetwo variables.
It helps to remember how CPython deals with the same issue: interpreterlevel code, in CPython, is written in C and thus typical code for theaddition is PyNumber_Add(p_a, p_b)
where p_a
and p_b
are Cvariables of type PyObject*
. This is conceptually similar to how we writeour interpreter-level code in Python.
Moreover, in PyPy we have to make a sharp distinction betweeninterpreter- and application-level exceptions: application exceptionsare always contained inside an instance of OperationError
. Thismakes it easy to distinguish failures (or bugs) in our interpreter-level codefrom failures appearing in a python application level program that we areinterpreting.
Application level is often preferable
Application-level code is substantially higher-level, and thereforecorrespondingly easier to write and debug. For example, suppose we wantto implement the update
method of dict objects. Programming atapplication level, we can write an obvious, simple implementation, onethat looks like an executable definition of update
, forexample:
- def update(self, other):
- for k in other.keys():
- self[k] = other[k]
If we had to code only at interpreter level, we would have to codesomething much lower-level and involved, say something like:
- def update(space, w_self, w_other):
- w_keys = space.call_method(w_other, 'keys')
- w_iter = space.iter(w_keys)
- while True:
- try:
- w_key = space.next(w_iter)
- except OperationError as e:
- if not e.match(space, space.w_StopIteration):
- raise # re-raise other app-level exceptions
- break
- w_value = space.getitem(w_other, w_key)
- space.setitem(w_self, w_key, w_value)
This interpreter-level implementation looks much more similar to the Csource code. It is still more readable than its C counterpart becauseit doesn’t contain memory management details and can use Python’s nativeexception mechanism.
In any case, it should be obvious that the application-level implementationis definitely more readable, more elegant and more maintainable than theinterpreter-level one (and indeed, dict.update is really implemented atapplevel in PyPy).
In fact, in almost all parts of PyPy, you find application level code inthe middle of interpreter-level code. Apart from some bootstrappingproblems (application level functions need a certain initializationlevel of the object space before they can be executed), applicationlevel code is usually preferable. We have an abstraction (called the‘Gateway’) which allows the caller of a function to remain ignorant ofwhether a particular function is implemented at application orinterpreter level.
Our runtime interpreter is “RPython”
In order to make a C code generator feasible all code on interpreter level hasto restrict itself to a subset of the Python language, and we adhere to somerules which make translation to lower level languages feasible. Code onapplication level can still use the full expressivity of Python.
Unlike source-to-source translations (like e.g. Starkiller or more recentlyShedSkin) we starttranslation from live python code objects which constitute our Pythoninterpreter. When doing its work of interpreting bytecode our Pythonimplementation must behave in a static way often referenced as“RPythonic”.
However, when the PyPy interpreter is started as a Python program, itcan use all of the Python language until it reaches a certain point intime, from which on everything that is being executed must be static.That is, during initialization our program is free to use thefull dynamism of Python, including dynamic code generation.
An example can be found in the current implementation which is quiteelegant: For the definition of all the opcodes of the Pythoninterpreter, the module dis
is imported and used to initialize ourbytecode interpreter. (See initclass
inpypy/interpreter/pyopcode.py). Thissaves us from adding extra modules to PyPy. The import code is run atstartup time, and we are allowed to use the CPython builtin importfunction.
After the startup code is finished, all resulting objects, functions,code blocks etc. must adhere to certain runtime restrictions which wedescribe further below. Here is some background for why this is so:during translation, a whole program analysis (“type inference”) isperformed, which makes use of the restrictions defined in RPython. Thisenables the code generator to emit efficient machine level replacementsfor pure integer objects, for instance.
Wrapping rules
Wrapping
PyPy is made of Python source code at two levels: there is on the one handapplication-level code that looks like normal Python code, and thatimplements some functionalities as one would expect from Python code (e.g. onecan give a pure Python implementation of some built-in functions likezip()
). There is also interpreter-level code for the functionalitiesthat must more directly manipulate interpreter data and objects (e.g. the mainloop of the interpreter, and the various object spaces).
Application-level code doesn’t see object spaces explicitly: it runs using anobject space to support the objects it manipulates, but this is implicit.There is no need for particular conventions for application-level code. Thesequel is only about interpreter-level code. (Ideally, no application-levelvariable should be called space
or w_xxx
to avoid confusion.)
The w
prefixes so lavishly used in the example above indicate,by PyPy coding convention, that we are dealing with _wrapped (or boxed) objects,that is, interpreter-level objects which the object space constructsto implement corresponding application-level objects. Each objectspace supplies wrap
, unwrap
, int_w
, interpclass_w
,etc. operations that move between the two levels for objects of simplebuilt-in types; each object space also implements other Python typeswith suitable interpreter-level classes with some amount of internalstructure.
For example, an application-level Python list
is implemented by the standard object space as aninstance of W_ListObject
, which has an instance attributewrappeditems
(an interpreter-level list which contains theapplication-level list’s items as wrapped objects).
The rules are described in more details below.
Naming conventions
space
: the object space is only visible atinterpreter-level code, where it is by convention passed around by the namespace
.wxxx
: any object seen by application-level code is anobject explicitly managed by the object space. From theinterpreter-level point of view, this is called a _wrapped_object. Thew
prefix is used for any type ofapplication-level object.xxxw
: an interpreter-level container for wrappedobjects, for example a list or a dict containing wrappedobjects. Not to be confused with a wrapped object thatwould be a list or a dict: these are normal wrapped objects,so they use thew
prefix.
Operations on w_xxx
The core bytecode interpreter considers wrapped objects as black boxes.It is not allowed to inspect them directly. The allowedoperations are all implemented on the object space: they arecalled space.xxx()
, where xxx
is a standard operationname (add
, getattr
, call
, eq
…). They are documented in theobject space document.
A short warning: don’t do wx == w_y
or w_x is w_y
!rationale for this rule is that there is no reason that twowrappers are related in any way even if they contain whatlooks like the same object at application-level. To checkfor equality, use space.is_true(space.eq(w_x, w_y))
oreven better the short-cut space.eq_w(w_x, w_y)
returningdirectly a interpreter-level bool. To check for identity,use space.is_true(space.is
(w_x, w_y))
or betterspace.is_w(w_x, w_y)
.
Application-level exceptions
Interpreter-level code can use exceptions freely. However,all application-level exceptions are represented as anOperationError
at interpreter-level. In other words, allexceptions that are potentially visible at application-levelare internally an OperationError
. This is the case of allerrors reported by the object space operations(space.add()
etc.).
To raise an application-level exception:
- from pypy.interpreter.error import oefmt
- raise oefmt(space.w_XxxError, "message")
- raise oefmt(space.w_XxxError, "file '%s' not found in '%s'", filename, dir)
- raise oefmt(space.w_XxxError, "file descriptor '%d' not open", fd)
To catch a specific application-level exception:
- try:
- ...
- except OperationError as e:
- if not e.match(space, space.w_XxxError):
- raise
- ...
This construct catches all application-level exceptions, so wehave to match it against the particular w_XxxError
we areinterested in and re-raise other exceptions. The exceptioninstance e
holds two attributes that you can inspect:e.w_type
and e.w_value
. Do not use e.w_type
tomatch an exception, as this will miss exceptions that areinstances of subclasses.
Modules in PyPy
Modules visible from application programs are imported frominterpreter or application level files. PyPy reuses almost all pythonmodules of CPython’s standard library, currently from version 2.7.8. Wesometimes need to modify modules and - more often - regression testsbecause they rely on implementation details of CPython.
If we don’t just modify an original CPython module but need to rewriteit from scratch we put it into lib_pypy/ as a pure application levelmodule.
When we need access to interpreter-level objects we put the module intopypy/module. Such modules use a mixed module mechanismwhich makes it convenient to use both interpreter- and application-level partsfor the implementation. Note that there is no extra facility forpure-interpreter level modules, you just write a mixed module and leave theapplication-level part empty.
Determining the location of a module implementation
You can interactively find out where a module comes from, when running py.py.here are examples for the possible locations:
- >>>> import sys
- >>>> sys.__file__
- '/home/hpk/pypy-dist/pypy/module/sys'
- >>>> import cPickle
- >>>> cPickle.__file__
- '/home/hpk/pypy-dist/lib_pypy/cPickle..py'
- >>>> import os
- >>>> os.__file__
- '/home/hpk/pypy-dist/lib-python/2.7/os.py'
- >>>>
Module directories / Import order
Here is the order in which PyPy looks up Python modules:
pypy/module
mixed interpreter/app-level builtin modules, such as thesys
andbuiltin
module.
contents of PYTHONPATH
lookup application level modules in each of the:
separated list of directories, specified in thePYTHONPATH
environment variable.
lib_pypy/
contains pure Python reimplementation of modules.
lib-python/2.7/
The modified CPython library.
Modifying a CPython library module or regression test
Although PyPy is very compatible with CPython we sometimes needto change modules contained in our copy of the standard library,often due to the fact that PyPy works with all new-style classesby default and CPython has a number of places where it relieson some classes being old-style.
We just maintain those changes in place,to see what is changed we have a branch called _vendor/stdlib_wich contains the unmodified cpython stdlib
Implementing a mixed interpreter/application level Module
If a module needs to access PyPy’s interpreter levelthen it is implemented as a mixed module.
Mixed modules are directories in pypy/module with an _init.py_file containing specifications where each name in a module comes from.Only specified names will be exported to a Mixed Module’s applevelnamespace.
Sometimes it is necessary to really write some functions in C (or whatevertarget language). See rffi details.
application level definitions
Application level specifications are found in the appleveldefsdictionary found in init.py
files of directories in pypy/module
.For example, in pypy/module/builtin/init.py you find the followingentry specifying where __builtin
.locals
comes from:
- ...
- 'locals' : 'app_inspect.locals',
- ...
The app_
prefix indicates that the submodule app_inspect
isinterpreted at application level and the wrapped function value for locals
will be extracted accordingly.
interpreter level definitions
Interpreter level specifications are found in the interpleveldefs
dictionary found in init.py
files of directories in pypy/module
.For example, in pypy/module/builtin/init.py the followingentry specifies where builtin.len
comes from:
- ...
- 'len' : 'operation.len',
- ...
The operation
submodule lives at interpreter level and len
is expected to be exposable to application level. Here isthe definition for operation.len()
:
- def len(space, w_obj):
- "len(object) -> integer\n\nReturn the number of items of a sequence or mapping."
- return space.len(w_obj)
Exposed interpreter level functions usually take a space
argumentand some wrapped values (see Wrapping rules) .
You can also use a convenient shortcut in interpleveldefs
dictionaries:namely an expression in parentheses to specify an interpreter levelexpression directly (instead of pulling it indirectly from a file):
- ...
- 'None' : '(space.w_None)',
- 'False' : '(space.w_False)',
- ...
The interpreter level expression has a space
binding whenit is executed.
Adding an entry under pypy/module (e.g. mymodule) entails automaticcreation of a new config option (such as –withmod-mymodule and–withoutmod-mymodule (the latter being the default)) for py.py andtranslate.py.
Testing modules in lib_pypy/
You can go to the pypy/module/test_lib_pypy/ directory and invoke thetesting tool (“py.test” or “python ../../pypy/test_all.py”) to run testsagainst the lib_pypy hierarchy. This allows us to quickly test ourpython-coded reimplementations against CPython.
Testing modules in pypy/module
Simply change to pypy/module
or to a subdirectory and run thetests as usual.
Testing modules in lib-python
In order to let CPython’s regression tests run against PyPyyou can switch to the lib-python/ directory and runthe testing tool in order to start compliance tests.(XXX check windows compatibility for producing test reports).
Naming conventions and directory layout
Directory and File Naming
- directories/modules/namespaces are always lowercase
- never use plural names in directory and file names
init.py
is usually empty except forpypy/objspace/
andpypy/module/
/init.py
.- don’t use more than 4 directory nesting levels
- keep filenames concise and completion-friendly.
Naming of python objects
- class names are CamelCase
- functions/methods are lowercase and
_
separated - objectspace classes are spelled
XyzObjSpace
. e.g.- StdObjSpace
- FlowObjSpace
- at interpreter level and in ObjSpace all boxed valueshave a leading
w
to indicate “wrapped values”. Thisincludes w_self. Don’t usew
in application levelpython only code.
Committing & Branching to the repository
write good log messages because several peopleare reading the diffs.
What was previously called
trunk
is called thedefault
branch inmercurial. Branches in mercurial are always pushed together with the restof the repository. To create atry1
branch (assuming that a branch namedtry1
doesn’t already exists) you should do:
- hg branch try1
The branch will be recorded in the repository only after a commit. To switchback to the default branch:
- hg update default
For further details use the help or refer to the official wiki:
- hg help branch
Using the development bug/feature tracker
We use https://foss.heptapod.net/pypy/pypy for issues tracking andpull-requests.
Testing in PyPy
Our tests are based on the py.test tool which lets you writeunittests without boilerplate. All tests of modulesin a directory usually reside in a subdirectory test. There arebasically two types of unit tests:
- Interpreter Level tests. They run at the same level as PyPy’sinterpreter.
- Application Level tests. They run at application level which meansthat they look like straight python code but they are interpreted by PyPy.
Interpreter level tests
You can write test functions and methods like this:
- def test_something(space):
- # use space ...
- class TestSomething(object):
- def test_some(self):
- # use 'self.space' here
Note that the prefix test for test functions and Test for testclasses is mandatory. In both cases you can import Python modules atmodule global level and use plain ‘assert’ statements thanks to theusage of the py.test tool.
Application level tests
For testing the conformance and well-behavedness of PyPy itis often sufficient to write “normal” application-levelPython code that doesn’t need to be aware of any particularcoding style or restrictions. If we have a choice we oftenuse application level tests which are in files whose name starts with theapptest_ prefix and look like this:
- # spaceconfig = {"usemodules":["array"]}
- def test_this():
- # application level test code
These application level test functions will run on topof PyPy, i.e. they have no access to interpreter details.
By default, they run on top of an untranslated PyPy which runs on top of thehost interpreter. When passing the -D option, they run directly on top of thehost interpreter, which is usually a translated pypy executable in this case:
- pypy3 -m pytest -D pypy/
Note that in interpreted mode, only a small subset of pytest’s functionality isavailable. To configure the object space, the host interpreter will parse theoptional spaceconfig declaration. This declaration must be in the form of avalid json dict.
Mixed-level tests (deprecated)
Mixed-level tests are similar to application-level tests, the difference beingthat they’re just snippets of app-level code embedded in an interp-level testfile, like this:
- class AppTestSomething(object):
- def test_this(self):
- # application level test code
You cannot use imported modules from global level becausethey are imported at interpreter-level while you test coderuns at application level. If you need to use modulesyou have to import them within the test function.
Data can be passed into the AppTest usingthe setupclass
method of the AppTest. All wrapped objects that areattached to the class there and start with w
can be accessedvia self (but without the w_
) in the actual test method. An example:
- class AppTestErrno(object):
- def setup_class(cls):
- cls.w_d = cls.space.wrap({"a": 1, "b", 2})
- def test_dict(self):
- assert self.d["a"] == 1
- assert self.d["b"] == 2
Another possibility is to use cls.space.appexec, for example:
- class AppTestSomething(object):
- def setup_class(cls):
- arg = 2
- cls.w_result = cls.space.appexec([cls.space.wrap(arg)], """(arg):
- return arg ** 6
- """)
- def test_power(self):
- assert self.result == 2 ** 6
which executes the code string function with the given arguments at app level.Note the use of w_result
in setup_class
but self.result in the test.Here is how to define an app level class in setup_class
that can be usedin subsequent tests:
- class AppTestSet(object):
- def setup_class(cls):
- w_fakeint = cls.space.appexec([], """():
- class FakeInt(object):
- def __init__(self, value):
- self.value = value
- def __hash__(self):
- return hash(self.value)
- def __eq__(self, other):
- if other == self.value:
- return True
- return False
- return FakeInt
- """)
- cls.w_FakeInt = w_fakeint
- def test_fakeint(self):
- f1 = self.FakeInt(4)
- assert f1 == 4
- assert hash(f1) == hash(4)
Command line tool test_all
You can run almost all of PyPy’s tests by invoking:
- python test_all.py file_or_directory
which is a synonym for the general py.test utilitylocated in the py/bin/
directory. For switches tomodify test execution pass the -h
option.
Coverage reports
In order to get coverage reports the pytest-cov plugin is included.it adds some extra requirements ( coverage and cov-core )and can once they are installed coverage testing can be invoked via:
- python test_all.py --cov file_or_direcory_to_cover file_or_directory
Test conventions
- adding features requires adding appropriate tests. (It often evenmakes sense to first write the tests so that you are sure that theyactually can fail.)
- All over the pypy source code there are test/ directorieswhich contain unit tests. Such scripts can usually be executeddirectly or are collectively run by pypy/test_all.py
Changing documentation and website
documentation/website files in your local checkout
Most of the PyPy’s documentation is kept in pypy/doc.You can simply edit or add ‘.rst’ files which contain ReST-markupedfiles. Here is a ReST quickstart but you can also just lookat the existing documentation and see how things work.
Note that the web site of http://pypy.org/ is maintained separately.It is in the repository https://foss.heptapod.net/pypy/pypy.org
Automatically test documentation/website changes
We automatically check referential integrity and ReST-conformance. In order torun the tests you need sphinx installed. Then go to the local checkoutof the documentation directory and run the Makefile:
- cd pypy/doc
- make html
If you see no failures chances are high that your modifications at leastdon’t produce ReST-errors or wrong local references. Now you will have _.html_files in the documentation directory which you can point your browser to!
Additionally, if you also want to check for remote references insidethe documentation issue:
- make linkcheck
which will check that remote URLs are reachable.