Contributing Guidelines
PyPy is a very large project that has a reputation of being hard to dive into.Some of this fame is warranted, some of it is purely accidental. There are threeimportant lessons that everyone willing to contribute should learn:
- PyPy has layers. There are many pieces of architecture that are very wellseparated from each other. More about this below, but often the manifestationof this is that things are at a different layer than you would expect themto be. For example if you are looking for the JIT implementation, you willnot find it in the implementation of the Python programming language.
- Because of the above, we are very serious about Test Driven Development.It’s not only what we believe in, but also that PyPy’s architecture isworking very well with TDD in mind and not so well without it. Oftendevelopment means progressing in an unrelated corner, one unittestat a time; and then flipping a giant switch, bringing it all together.(It generally works out of the box. If it doesn’t, then we didn’twrite enough unit tests.) It’s worth repeating - PyPy’sapproach is great if you do TDD, and not so great otherwise.
- PyPy uses an entirely different set of tools - most of them includedin the PyPy repository. There is no Makefile, nor autoconf. More below.
The first thing to remember is that PyPy project is very different than mostprojects out there. It’s also different from a classic compiler project,so academic courses about compilers often don’t apply or lead in the wrongdirection. However, if you want to understand how designing & building a runtimeworks in the real world then this is a great project!
Getting involved
PyPy employs a relatively standard open-source development process. You areencouraged as a first step to join our pypy-dev mailing list and IRC channel,details of which can be found in our contact section. The folksthere are very friendly, and can point you in the right direction.
We give out commit rights usually fairly liberally, so if you want to do somethingwith PyPy, you can become a committer. We also run frequent coding sprints whichare separately announced and often happen around Python conferences such asEuroPython or PyCon. Upcoming events are usually announced on the blog.
Further Reading: Contact
Your first contribution
The first and most important rule how not to contribute to PyPy is“just hacking a feature”. This won’t work, and you’ll find your PR will typicallyrequire a lot of re-work. There are a few reasons why not:
- build times are large
- PyPy has very thick layer separation
- context of the cPython runtime is often required
Instead, reach out on the dev mailing list or the IRC channel, and we’re morethan happy to help! :)
Some ideas for first contributions are:
- Documentation - this will give you an understanding of the pypy architecture
- Test failures - find a failing test in the nightly builds, and fix it
- Missing language features - these are listed in our issue tracker
Source Control
PyPy’s main repositories are hosted here: https://foss.heptapod.net/pypy.
Heptapod is a friendly fork of GitLab CommunityEdition supporting Mercurial. https://foss.heptapod.net is a public instancefor Free and Open-Source Software (more information here).
Thanks to Octobus and Clever Cloud for providing this service!
If you are new with Mercurial and Heptapod, you can read this short tutorial:
Get Access
The important take-away from that tutorial for experienced developers is thatsince the free hosting on foss.heptapod.net does not allow personal forks, youneed permissions to push your changes directly to our repo. Once you sign in tohttps://foss.heptapod.net using either a new login or your GitHub or Atlassianlogins, you can get developer status for pushing directly tothe project (just ask by clicking the link at foss.heptapod.net/pypy just underthe logo, and you’ll get it, basically). Once you have it you can rewrite yourfile .hg/hgrc
to contain default = ssh://hg@foss.heptapod.net/pypy/pypy
.Your changes will then be pushed directly to the official repo, but (if youfollow these rules) they are still on a branch, and we can still review thebranches you want to merge. With developer status, you can push topicbranches. If you wish to push long-lived branches, you will need to ask forhigher permissions.
Clone
- Clone the PyPy repo to your local machine with the command
hg clone https://foss.heptapod.net/pypy/pypy
. It takes a minute or twooperation but only ever needs to be done once. See alsohttp://pypy.org/download.html#building-from-source .If you already cloned the repo before, even if some time ago,then you can reuse the same clone by editing the file.hg/hgrc
inyour clone to contain the linedefault = https://foss.heptapod.net/pypy/pypy
, and then dohg pull && hg up
. If you already have such a clone but don’t want to change it,you can clone that copy withhg clone /path/to/other/copy
, andthen edit.hg/hgrc
as above and dohg pull && hg up
. - Now you have a complete copy of the PyPy repo. Make a long-lived branchwith a command like
hg branch name_of_your_branch
, or make a short-lived branch for a simple fix with a command likehg topic issueXXXX
.
Edit
- Edit things. Use
hg diff
to see what you changed. Usehg add
to make Mercurial aware of new files you added, e.g. new test files.Usehg status
to see if there are such files. Write and run tests!(See the rest of this page.) - Commit regularly with
hg commit
. A one-line commit message isfine. We love to have tons of commits; make one as soon as you havesome progress, even if it is only some new test that doesn’t pass yet,or fixing things even if not all tests pass. Step by step, you arebuilding the history of your changes, which is the point of a versioncontrol system. (There are commands likehg log
andhg up
that you should read about later, to learn how to navigate thishistory.) - The commits stay on your machine until you do
hg push
to “push”them back to the repo named in the file.hg/hgrc
. Repos arebasically just collections of commits (a commit is also called achangeset): there is one repo per url, plus one for each local copy oneach local machine. The commandshg push
andhg pull
copycommits around, with the goal that all repos in question end up withthe exact same set of commits. By opposition,hg up
only updatesthe “working copy” by reading the local repository, i.e. it makes thefiles that you see correspond to the latest (or any other) commitlocally present. - You should push often; there is no real reason not to. Remember thateven if they are pushed, with the setup above, the commits are only in thebranch younamed. Yes, they are publicly visible, but don’t worry about someonewalking around the many branches of PyPy saying “hah, lookat the bad coding style of that person”. Try to get into the mindsetthat your work is not secret and it’s fine that way. We might notaccept it as is for PyPy, asking you instead to improve some things,but we are not going to judge you unless you don’t write tests.
Merge Request
- The final step is to open a merge request, so that we know that you’dlike to merge that branch back to the original
pypy/pypy
repo.This can also be done several times if you have interestingintermediate states, but if you get there, then we’re likely toproceed to the next stage, which is… - If you get closer to the regular day-to-day development, you’ll noticethat we generally push small changes as one or a few commits directlyto the branch
default
orpy3.6
. Also, we often collaborate even ifwe are on other branches, which do not really “belong” to anyone. At thispoint you’ll needhg merge
and learn how to resolve conflicts thatsometimes occur when two people try to push different commits inparallel on the same branch. But it is likely an issue for later:-)
Architecture
PyPy has layers. Just like ogres or onions. Those layers help us keep therespective parts separated enough to be worked on independently and make thecomplexity manageable. This is, again, just a sanity requirement for sucha complex project. For example writing a new optimization for the JIT usuallydoes not involve touching a Python interpreter at all or the JIT assemblerbackend or the garbage collector. Instead it requires writing small tests inrpython/jit/metainterp/optimizeopt/test/test_*
and fixing files there.After that, you can just compile PyPy and things should just work.
Further Reading: architecture
Where to start?
PyPy is made from parts that are relatively independent of each other.You should start looking at the part that attracts you most (all paths arerelative to the PyPy top level directory). You may look at ourdirectory reference or start off at one of the followingpoints:
- pypy/interpreter contains the bytecode interpreter: bytecode dispatcherin pypy/interpreter/pyopcode.py, frame and code objects inpypy/interpreter/eval.py and pypy/interpreter/pyframe.py,function objects and argument passing in pypy/interpreter/function.pyand pypy/interpreter/argument.py, the object space interfacedefinition in pypy/interpreter/baseobjspace.py, modules inpypy/interpreter/module.py and pypy/interpreter/mixedmodule.py.Core types supporting the bytecode interpreter are defined inpypy/interpreter/typedef.py.
- pypy/interpreter/pyparser contains a recursive descent parser,and grammar files that allow it to parse the syntax of various Pythonversions. Once the grammar has been processed, the parser can betranslated by the above machinery into efficient code.
- pypy/interpreter/astcompiler contains the compiler. Thiscontains a modified version of the compiler package from CPythonthat fixes some bugs and is translatable.
- pypy/objspace/std contains theStandard object space. The main fileis pypy/objspace/std/objspace.py. For each type, the file
xxxobject.py
contains the implementation for objects of typexxx
,as a first approximation. (Some types have multiple implementations.)
Building
For building PyPy, we recommend installing a pre-built PyPy first (seeDownloading and Installing PyPy). It is possible to build PyPy with CPython, but it will take alot longer to run – depending on your architecture, between two and threetimes as long.
Further Reading: Build
Coding Guide
As well as the usual pep8 and formatting standards, there are a number ofnaming conventions and coding styles that are important to understand beforebrowsing the source.
Further Reading: Coding Guide
Testing
Test driven development
Instead, we practice a lot of test driven development. This is partly becauseof very high quality requirements for compilers and partly because there issimply no other way to get around such complex project, that will keep you sane.There are probably people out there who are smart enough not to need it, we’renot one of those. You may consider familiarizing yourself with pytest,since this is a tool we use for tests.This leads to the next issue:
py.test and the py lib
The py.test testing tool drives all our testing needs.
We use the py library for filesystem path manipulations, terminalwriting, logging and some other support functionality.
You don’t necessarily need to install these two libraries becausewe also ship them inlined in the PyPy source tree.
Running PyPy’s unit tests
PyPy development always was and is still thoroughly test-driven.We use the flexible py.test testing tool which you can install independently and use for other projects.
The PyPy source tree comes with an inlined version of py.test
which you can invoke by typing:
- python pytest.py -h
This is usually equivalent to using an installed version:
- py.test -h
If you encounter problems with the installed versionmake sure you have the correct version installed whichyou can find out with the —version
switch.
You will need the build requirements to run tests successfully, since many ofthem compile little pieces of PyPy and then run the tests inside that minimalinterpreter. The cpyext tests also require pycparser, and many tests buildcases with hypothesis.
Now on to running some tests. PyPy has many different test directoriesand you can use shell completion to point at directories or files:
- py.test pypy/interpreter/test/test_pyframe.py
- # or for running tests of a whole subdirectory
- py.test pypy/interpreter/
See py.test usage and invocations for some more generic infoon how you can run tests.
Beware trying to run “all” pypy tests by pointing to the rootdirectory or even the top level subdirectory pypy
. It takeshours and uses huge amounts of RAM and is not recommended.
To run CPython regression tests, you should start with a translated PyPy andrun the tests as you would with CPython (see below). You can, however, alsoattempt to run the tests before translation, but be aware that it is done witha hack that doesn’t work in all cases and it is usually extremely slow:py.test lib-python/2.7/test/test_datetime.py
. Usually, a better idea is toextract a minimal failing test of at most a few lines, and put it into one ofour own tests in pypy/*/test/
.
Testing After Translation
While the usual invocation of pytest runs app-level tests on an untranslatedPyPy that runs on top of CPython, we have a test extension to run testsdirectly on the host python. This is very convenient for modules such ascpyext, to compare and contrast test results between CPython and PyPy.
App-level tests run directly on the host interpreter when passing -D or–direct-apptest to pytest:
- pypy3 -m pytest -D pypy/interpreter/test/apptest_pyframe.py
Mixed-level tests are invoked by using the -A or –runappdirect option topytest:
- python2 pytest.py -A pypy/module/cpyext/test
where python2 can be either python2 or pypy2. On the py3 branch, thecollection phase must be run with python2 so untranslated tests are runwith:
- cpython2 pytest.py -A pypy/module/cpyext/test --python=path/to/pypy3
To run a test from the standard CPython regression test suite, use the regularPython way, i.e. (replace “pypy” with the exact binary name, if needed):
- pypy -m test.test_datetime
Tooling & Utilities
If you are interested in the inner workings of the PyPy Python interpreter,there are some features of the untranslated Python interpreter that allow youto introspect its internals.
Interpreter-level console
To start interpreting Python with PyPy, install a C compiler that issupported by distutils and use Python 2.7 or greater to run PyPy:
- cd pypy
- python bin/pyinteractive.py
After a few seconds (remember: this is running on top of CPython), you shouldbe at the PyPy prompt, which is the same as the Python prompt, but with anextra “>”.
If you press<Ctrl-C> on the console you enter the interpreter-level console, ausual CPython console. You can then access internal objects of PyPy(e.g. the object space) and any variables you have created on the PyPyprompt with the prefix w_
:
- >>>> a = 123
- >>>> <Ctrl-C>
- *** Entering interpreter-level console ***
- >>> w_a
- W_IntObject(123)
The mechanism works in both directions. If you define a variable with the w_
prefix on the interpreter-level, you will see it on the app-level:
- >>> w_l = space.newlist([space.wrap(1), space.wrap("abc")])
- >>> <Ctrl-D>
- *** Leaving interpreter-level console ***
- KeyboardInterrupt
- >>>> l
- [1, 'abc']
Note that the prompt of the interpreter-level console is only ‘>>>’ sinceit runs on CPython level. If you want to return to PyPy, press <Ctrl-D> (underLinux) or <Ctrl-Z>, <Enter> (under Windows).
Also note that not all modules are available by default in this mode (forexample: _continuation
needed by greenlet
) , you may need to use one of—withmod-…
command line options.
You may be interested in reading more about the distinction betweeninterpreter-level and app-level.
pyinteractive.py options
To list the PyPy interpreter command line options, type:
- cd pypy
- python bin/pyinteractive.py --help
pyinteractive.py supports most of the options that CPython supports too (in addition to alarge amount of options that can be used to customize pyinteractive.py).As an example of using PyPy from the command line, you could type:
- python pyinteractive.py --withmod-time -c "from test import pystone; pystone.main(10)"
Alternatively, as with regular Python, you can simply give ascript name on the command line:
- python pyinteractive.py --withmod-time ../../lib-python/2.7/test/pystone.py 10
The —withmod-xxx
option enables the built-in module xxx
. Bydefault almost none of them are, because initializing them takes time.If you want anyway to enable all built-in modules, you can use—allworkingmodules
.
See our configuration sections for details about what all the commandlineoptions do.
Tracing bytecode and operations on objects
You can use a simple tracing mode to monitor the interpretation ofbytecodes. To enable it, set pytrace = 1
on the interactivePyPy console:
- >>>> __pytrace__ = 1
- Tracing enabled
- >>>> x = 5
- <module>: LOAD_CONST 0 (5)
- <module>: STORE_NAME 0 (x)
- <module>: LOAD_CONST 1 (None)
- <module>: RETURN_VALUE 0
- >>>> x
- <module>: LOAD_NAME 0 (x)
- <module>: PRINT_EXPR 0
- 5
- <module>: LOAD_CONST 0 (None)
- <module>: RETURN_VALUE 0
- >>>>
Demos
The example-interpreter repository contains an example interpreterwritten using the RPython translation toolchain.
graphviz & pygame for flow graph viewing (highly recommended)
graphviz and pygame are both necessary if you want to look at generated flowgraphs:
graphviz: http://www.graphviz.org/Download.php