Differences between PyPy and CPython
This page documents the few differences and incompatibilities betweenthe PyPy Python interpreter and CPython. Some of these differencesare “by design”, since we think that there are cases in which thebehaviour of CPython is buggy, and we do not want to copy bugs.
Differences that are not listed here should be considered bugs ofPyPy.
Differences related to garbage collection strategies
The garbage collectors used or implemented by PyPy are not based onreference counting, so the objects are not freed instantly when they are nolonger reachable. The most obvious effect of this is that files (and sockets, etc) are notpromptly closed when they go out of scope. For files that are opened forwriting, data can be left sitting in their output buffers for a while, makingthe on-disk file appear empty or truncated. Moreover, you might reach yourOS’s limit on the number of concurrently opened files.
If you are debugging a case where a file in your program is not closedproperly, you can use the -X track-resources
command line option. If it isgiven, a ResourceWarning
is produced for every file and socket that thegarbage collector closes. The warning will contain the stack trace of theposition where the file or socket was created, to make it easier to see whichparts of the program don’t close files explicitly.
Fixing this difference to CPython is essentially impossible without forcing areference-counting approach to garbage collection. The effect that youget in CPython has clearly been described as a side-effect of theimplementation and not a language design decision: programs relying onthis are basically bogus. It would be a too strong restriction to try to enforceCPython’s behavior in a language spec, given that it has no chance to beadopted by Jython or IronPython (or any other port of Python to Java or.NET).
Even the naive idea of forcing a full GC when we’re getting dangerouslyclose to the OS’s limit can be very bad in some cases. If your programleaks open files heavily, then it would work, but force a complete GCcycle every n’th leaked file. The value of n is a constant, but theprogram can take an arbitrary amount of memory, which makes a completeGC cycle arbitrarily long. The end result is that PyPy would spend anarbitrarily large fraction of its run time in the GC — slowing downthe actual execution, not by 10% nor 100% nor 1000% but by essentiallyany factor.
To the best of our knowledge this problem has no better solution thanfixing the programs. If it occurs in 3rd-party code, this means goingto the authors and explaining the problem to them: they need to closetheir open files in order to run on any non-CPython-based implementationof Python.
Here are some more technical details. This issue affects the precisetime at which del
methods are called, whichis not reliable or timely in PyPy (nor Jython nor IronPython). It also means thatweak references may stay alive for a bit longer than expected. Thismakes “weak proxies” (as returned by weakref.proxy()
) somewhat lessuseful: they will appear to stay alive for a bit longer in PyPy, andsuddenly they will really be dead, raising a ReferenceError
on thenext access. Any code that uses weak proxies must carefully catch suchReferenceError
at any place that uses them. (Or, better yet, don’t useweakref.proxy()
at all; use weakref.ref()
.)
Note a detail in the documentation for weakref callbacks:
If callback is provided and not None, and the returned weakref object is still alive, the callback will be called when the object is about to be finalized.
There are cases where, due to CPython’s refcount semantics, a weakrefdies immediately before or after the objects it points to (typicallywith some circular reference). If it happens to die just after, thenthe callback will be invoked. In a similar case in PyPy, both theobject and the weakref will be considered as dead at the same time,and the callback will not be invoked. (Issue #2030)
There are a few extra implications from the difference in the GC. Mostnotably, if an object has a del
, the del
is never called morethan once in PyPy; but CPython will call the same del
several timesif the object is resurrected and dies again (at least it is reliably so inolder CPythons; newer CPythons try to call destructors not more than once,but there are counter-examples). The del
methods arecalled in “the right” order if they are on objects pointing to eachother, as in CPython, but unlike CPython, if there is a dead cycle ofobjects referencing each other, their del
methods are called anyway;CPython would instead put them into the list garbage
of the gc
module. More information is available on the blog [1] [2].
Note that this difference might show up indirectly in some cases. Forexample, a generator left pending in the middle is — again —garbage-collected later in PyPy than in CPython. You can see thedifference if the yield
keyword it is suspended at is itselfenclosed in a try:
or a with:
block. This shows up for exampleas issue 736.
Using the default GC (called minimark
), the built-in function id()
works like it does in CPython. With other GCs it returns numbers thatare not real addresses (because an object can move around several times)and calling it a lot can lead to performance problem.
Note that if you have a long chain of objects, each with a reference tothe next one, and each with a del
, PyPy’s GC will perform badly. Onthe bright side, in most other cases, benchmarks have shown that PyPy’sGCs perform much better than CPython’s.
Another difference is that if you add a del
to an existing class it willnot be called:
- >>>> class A(object):
- .... pass
- ....
- >>>> A.__del__ = lambda self: None
- __main__:1: RuntimeWarning: a __del__ method added to an existing type will not be called
Even more obscure: the same is true, for old-style classes, if you attachthe del
to an instance (even in CPython this does not work withnew-style classes). You get a RuntimeWarning in PyPy. To fix these casesjust make sure there is a del
method in the class to start with(even containing only pass
; replacing or overriding it later works fine).
Last note: CPython tries to do a gc.collect()
automatically when theprogram finishes; not PyPy. (It is possible in both CPython and PyPy todesign a case where several gc.collect()
are needed before all objectsdie. This makes CPython’s approach only work “most of the time” anyway.)
Subclasses of built-in types
Officially, CPython has no rule at all for when exactlyoverridden method of subclasses of built-in types getimplicitly called or not. As an approximation, these methodsare never called by other built-in methods of the same object.For example, an overridden getitem()
in a subclass ofdict
will not be called by e.g. the built-in get()
method.
The above is true both in CPython and in PyPy. Differencescan occur about whether a built-in function or method willcall an overridden method of another object than self
.In PyPy, they are often called in cases where CPython would not.Two examples:
- class D(dict):
- def __getitem__(self, key):
- return "%r from D" % (key,)
- class A(object):
- pass
- a = A()
- a.__dict__ = D()
- a.foo = "a's own foo"
- print a.foo
- # CPython => a's own foo
- # PyPy => 'foo' from D
- glob = D(foo="base item")
- loc = {}
- exec "print foo" in glob, loc
- # CPython => base item
- # PyPy => 'foo' from D
Mutating classes of objects which are already used as dictionary keys
Consider the following snippet of code:
- class X(object):
- pass
- def __evil_eq__(self, other):
- print 'hello world'
- return False
- def evil(y):
- d = {X(): 1}
- X.__eq__ = __evil_eq__
- d[y] # might trigger a call to __eq__?
In CPython, evil_eq might be called, although there is no way to writea test which reliably calls it. It happens if y is not x
and hash(y) ==
hash(x)
, where hash(x)
is computed when x
is inserted into thedictionary. If by chance the condition is satisfied, then evil_eq
is called.
PyPy uses a special strategy to optimize dictionaries whose keys are instancesof user-defined classes which do not override the default hash
,eq
and cmp
: when using this strategy, eq
andcmp
are never called, but instead the lookup is done by identity, soin the case above it is guaranteed that eq
won’t be called.
Note that in all other cases (e.g., if you have a custom hash
andeq
in y
) the behavior is exactly the same as CPython.
Ignored exceptions
In many corner cases, CPython can silently swallow exceptions.The precise list of when this occurs is rather long, eventhough most cases are very uncommon. The most well-knownplaces are custom rich comparison methods (like eq);dictionary lookup; calls to some built-in functions likeisinstance().
Unless this behavior is clearly present by design anddocumented as such (as e.g. for hasattr()), in most cases PyPylets the exception propagate instead.
Object Identity of Primitive Values, is and id
Object identity of primitive values works by value equality, not by identity ofthe wrapper. This means that x + 1 is x + 1
is always true, for arbitraryintegers x
. The rule applies for the following types:
int
float
long
complex
str
(empty or single-character strings only)unicode
(empty or single-character strings only)tuple
(empty tuples only)frozenset
(empty frozenset only)- unbound method objects (for Python 2 only)
This change requires some changes to id
as well. id
fulfills thefollowing condition: x is y <=> id(x) == id(y)
. Therefore id
of theabove types will return a value that is computed from the argument, and canthus be larger than sys.maxint
(i.e. it can be an arbitrary long).
Note that strings of length 2 or greater can be equal without beingidentical. Similarly, x is (2,)
is not necessarily true even ifx
contains a tuple and x == (2,)
. The uniqueness rules applyonly to the particular cases described above. The str
, unicode
,tuple
and frozenset
rules were added in PyPy 5.4; before that, atest like if x is "?"
or if x is ()
could fail even if x
wasequal to "?"
or ()
. The new behavior added in PyPy 5.4 iscloser to CPython’s, which caches precisely the empty tuple/frozenset,and (generally but not always) the strings and unicodes of length <= 1.
Note that for floats there “is
” only one object per “bit pattern”of the float. So float('nan') is float('nan')
is true on PyPy,but not on CPython because they are two objects; but 0.0 is -0.0
is always False, as the bit patterns are different. As usual,float('nan') == float('nan')
is always False. When used incontainers (as list items or in sets for example), the exact rule ofequality used is “if x is y or x == y
” (on both CPython and PyPy);as a consequence, because all nans
are identical in PyPy, youcannot have several of them in a set, unlike in CPython. (Issue #1974).Another consequence is that cmp(float('nan'), float('nan')) == 0
, becausecmp
checks with is
first whether the arguments are identical (there isno good value to return from this call to cmp
, because cmp
pretendsthat there is a total order on floats, but that is wrong for NaNs).
C-API Differences
The external C-API has been reimplemented in PyPy as an internal cpyext module.We support most of the documented C-API, but sometimes internal C-abstractionsleak out on CPython and are abused, perhaps even unknowingly. For instance,assignment to a PyTupleObject
is not supported after the tuple isused internally, even by another C-API function call. On CPython this willsucceed as long as the refcount is 1. On PyPy this will always raise aSystemError('PyTuple_SetItem called on tuple after use of tuple")
exception (explicitly listed here for search engines).
Another similar problem is assignment of a new function pointer to any of thetpas*
structures after calling PyTypeReady
. For instance, overridingtpasnumber.nbint
with a different function after calling PyTypeReady
on CPython will result in the old function being called for x._int
()
(via class __dict
lookup) and the new function being called for int(x)
(via slot lookup). On PyPy we will always call the __new function, not theold, this quirky behaviour is unfortunately necessary to fully support NumPy.
Performance Differences
CPython has an optimization that can make repeated string concatenation notquadratic. For example, this kind of code runs in O(n) time:
- s = ''
- for string in mylist:
- s += string
In PyPy, this code will always have quadratic complexity. Note also, that theCPython optimization is brittle and can break by having slight variations inyour code anyway. So you should anyway replace the code with:
- parts = []
- for string in mylist:
- parts.append(string)
- s = "".join(parts)
Miscellaneous
Hash randomization (
-R
) is ignored in PyPy. In CPythonbefore 3.4 it has little point. Both CPython >= 3.4 and PyPy3implement the randomized SipHash algorithm and ignore-R
.You can’t store non-string keys in type objects. For example:
- class A(object):
- locals()[42] = 3
won’t work.
sys.setrecursionlimit(n)
sets the limit only approximately,by setting the usable stack space ton * 768
bytes. On Linux,depending on the compiler settings, the default of 768KB is enoughfor about 1400 calls.since the implementation of dictionary is different, the exact numberof times that
hash
andeq
are called is different.Since CPythondoes not give any specific guarantees either, don’t rely on it.assignment to
class
is limited to the cases where itworks on CPython 2.5. On CPython 2.6 and 2.7 it works in a bitmore cases, which are not supported by PyPy so far. (If needed,it could be supported, but then it will likely work in manymore case on PyPy than on CPython 2.6/2.7.)the
builtins
name is always referencing thebuiltin
module,never a dictionary as it sometimes is in CPython. Assigning tobuiltins
has no effect. (For usages of tools likeRestrictedPython, see issue #2653.)directly calling the internal magic methods of a few built-in typeswith invalid arguments may have a slightly different result. Forexample,
[].add(None)
and(2).add(None)
both returnNotImplemented
on PyPy; on CPython, only the latter does, and theformer raisesTypeError
. (Of course,[]+None
and2+None
both raiseTypeError
everywhere.) This difference is animplementation detail that shows up because of internal C-level slotsthat PyPy does not have.on CPython,
[].add
is amethod-wrapper
, andlist.add
is aslot wrapper
. On PyPy these are normalbound or unbound method objects. This can occasionally confuse sometools that inspect built-in types. For example, the standardlibraryinspect
module has a functionismethod()
that returnsTrue on unbound method objects but False on method-wrappers or slotwrappers. On PyPy we can’t tell the difference, soismethod([].add) == ismethod(list.add) == True
.in CPython, the built-in types have attributes that can beimplemented in various ways. Depending on the way, if you try towrite to (or delete) a read-only (or undeletable) attribute, you geteither a
TypeError
or anAttributeError
. PyPy tries tostrike some middle ground between full consistency and fullcompatibility here. This means that a few corner cases don’t raisethe same exception, likedel (lambda:None).closure
.in pure Python, if you write
class A(object): def f(self): pass
and have a subclassB
which doesn’t overridef()
, thenB.f(x)
still checks thatx
is an instance ofB
. InCPython, types written in C use a different rule. IfA
iswritten in C, any instance ofA
will be accepted byB.f(x)
(and actually,B.f is A.f
in this case). Some code that couldwork on CPython but not on PyPy includes:datetime.datetime.strftime(datetime.date.today(), …)
(here,datetime.date
is the superclass ofdatetime.datetime
).Anyway, the proper fix is arguably to use a regular method call inthe first place:datetime.date.today().strftime(…)
some functions and attributes of the
gc
module behave in aslightly different way: for example,gc.enable
andgc.disable
are supported, but “enabling and disabling the GC” hasa different meaning in PyPy than in CPython. These functionsactually enable and disable the major collections and theexecution of finalizers.PyPy prints a random line from past #pypy IRC topics at startup ininteractive mode. In a released version, this behaviour is suppressed, butsetting the environment variable PYPY_IRC_TOPIC will bring it back. Note thatdownstream package providers have been known to totally disable this feature.
PyPy’s readline module was rewritten from scratch: it is not GNU’sreadline. It should be mostly compatible, and it adds multilinesupport (see
multiline_input()
). On the other hand,parse_and_bind()
calls are ignored (issue #2072).sys.getsizeof()
always raisesTypeError
. This is because amemory profiler using this function is most likely to give resultsinconsistent with reality on PyPy. It would be possible to havesys.getsizeof()
return a number (with enough work), but that mayor may not represent how much memory the object uses. It doesn’t evenmake really sense to ask how much one object uses, in isolation withthe rest of the system. For example, instances have maps, which areoften shared across many instances; in this case the maps wouldprobably be ignored by an implementation ofsys.getsizeof()
, buttheir overhead is important in some cases if they are many instanceswith unique maps. Conversely, equal strings may share their internalstring data even if they are different objects—or empty containersmay share parts of their internals as long as they are empty. Evenstranger, some lists create objects as you read them; if you try toestimate the size in memory ofrange(10**6)
as the sum of allitems’ size, that operation will by itself create one million integerobjects that never existed in the first place. Note that some ofthese concerns also exist on CPython, just less so. For this reasonwe explicitly don’t implementsys.getsizeof()
.The
timeit
module behaves differently under PyPy: it prints the averagetime and the standard deviation, instead of the minimum, since the minimum isoften misleading.The
get_config_vars
method ofsysconfig
anddistutils.sysconfig
are not complete. On POSIX platforms, CPython fishes configuration variablesfrom the Makefile used to build the interpreter. PyPy should bake the valuesin during compilation, but does not do that yet."%d" % x
and"%x" % x
and similar constructs, wherex
isan instance of a subclass oflong
that overrides the specialmethodsstr
orhex
oroct
: PyPy doesn’t callthe special methods; CPython does—but only if it is a subclass oflong
, notint
. CPython’s behavior is really messy: e.g. for%x
it callshex()
, which is supposed to return a stringlike-0x123L
; then the0x
and the finalL
are removed, andthe rest is kept. If you return an unexpected string fromhex()
you get an exception (or a crash before CPython 2.7.13).In PyPy, dictionaries passed as
kwargs
can contain only string keys,even fordict()
anddict.update()
. CPython 2.7 allows non-stringkeys in these two cases (and only there, as far as we know). E.g. thiscode produces aTypeError
, on CPython 3.x as well as on any PyPy:dict(
{1: 2})
. (Note thatdict(**d1)
is equivalent todict(d1)
.)PyPy3:
class
attribute assignment between heaptypes and non heaptypes.CPython allows that for module subtypes, but not for e.g.int
orfloat
subtypes. Currently PyPy does not support theclass
attribute assignment for any non heaptype subtype.In PyPy, module and class dictionaries are optimized under the assumptionthat deleting attributes from them are rare. Because of this, e.g.
del foo.bar
wherefoo
is a module (or class) that contains thefunctionbar
, is significantly slower than CPython.Various built-in functions in CPython accept only positional argumentsand not keyword arguments. That can be considered a long-runninghistorical detail: newer functions tend to accept keyword argumentsand older function are occasionally fixed to do so as well. In PyPy,most built-in functions accept keyword arguments (
help()
shows theargument names). But don’t rely on it too much because futureversions of PyPy may have to rename the arguments if CPython startsaccepting them too.PyPy3:
distutils
has been enhanced to allow findingVsDevCmd.bat
in thedirectory pointed to by theVS%0.f0COMNTOOLS
(typicallyVS140COMNTOOLS
)environment variable. CPython searches forvcvarsall.bat
somewhere abovethat value.SyntaxError s try harder to give details about the cause of the failure, sothe error messages are not the same as in CPython
Dictionaries and sets are ordered on PyPy. On CPython < 3.6 they are not;on CPython >= 3.6 dictionaries (but not sets) are ordered.
PyPy2 refuses to load lone
.pyc
files, i.e..pyc
files that arestill there after you deleted the.py
file. PyPy3 instead behaves likeCPython. We could be amenable to fix this difference in PyPy2: the currentversion reflects our annoyance with this detail of CPython, which bitus too often while developing PyPy. (It is as easy as passing the—lonepycfile
flag when translating PyPy, if you really need it.)
Extension modules
List of extension modules that we support:
- Supported as built-in modules (in pypy/module/):
builtin pypy _ast _codecs _collections _continuation _ffi _hashlib _io _locale _lsprof _md5 _minimal_curses _multiprocessing _random _rawffi _sha _socket _sre _ssl _warnings _weakref _winreg array binascii bz2 cStringIO cmath cpyext crypt errno exceptions fcntl gc imp itertools marshal math mmap operator parser posix pyexpat select signal struct symbol sys termios thread time token unicodedata zipimport zlib
When translated on Windows, a few Unix-only modules are skipped,and the following module is built instead:
_winreg
- Supported by being rewritten in pure Python (possibly using
cffi
):see the lib_pypy/ directory. Examples of modules that wesupport this way:ctypes
,cPickle
,cmath
,dbm
,datetime
…Note that some modules are both in there and in the list above;by default, the built-in module is used (but can be disabledat translation time).
The extension modules (i.e. modules written in C, in the standard CPython)that are neither mentioned above nor in lib_pypy/ are not available in PyPy.(You may have a chance to use them anyway with cpyext.)