Why virtualenvwrapper is (Mostly) Not Written In Python
If you look at the source code for virtualenvwrapper you will see thatmost of the interesting parts are implemented as shell functions invirtualenvwrapper.sh
. The hook loader is a Python app, but doesn’tdo much to manage the virtualenvs. Some of the most frequently askedquestions about virtualenvwrapper are “Why didn’t you write this as aset of Python programs?” or “Have you thought about rewriting it inPython?” For a long time these questions baffled me, because it wasalways obvious to me that it had to be implemented as it is. But theycome up frequently enough that I feel the need to explain.
tl;dr: POSIX Made Me Do It
The choice of implementation language for virtualenvwrapper was madefor pragmatic, rather than philosophical, reasons. The wrappercommands need to modify the state and environment of the user’scurrent shell process, and the only way to do that is to have thecommands run inside that shell. That resulted in me writingvirtualenvwrapper as a set of shell functions, rather than separateshell scripts or even Python programs.
Where Do POSIX Processes Come From?
New POSIX processes are created when an existing process invokes thefork()
system call. The invoking process becomes the “parent” ofthe new “child” process, and the child is a full clone of theparent. The semantic result of fork()
is that an entire new copyof the parent process is created. In practice, optimizations arenormally made to avoid copying more memory than is absolutelynecessary (frequently via a copy-on-write system). But for thepurposes of this explanation it is sufficient to think of the child asa full replica of the parent.
The important parts of the parent process that are copied includedynamic memory (the stack and heap), static stuff (the program code),resources like open file descriptors, and the _environment variables_exported from the parent process. Inheriting environment variables isa fundamental aspect of the way POSIX programs pass state andconfiguration information to one another. A parent can establish aseries of name=value
pairs, which are then given to the childprocess. The child can access them through functions likegetenv()
, setenv()
(and in Python through os.environ
).
The choice of the term inherit to describe the way the variables andtheir contents are passed from parent to child issignificant. Although a child can change its own environment, itcannot directly change the environment settings of its parentbecause there is no system call to modify the parental environmentsettings.
How the Shell Runs a Program
When a shell receives a command to be executed, either interactivelyor by parsing a script file, and determines that the command isimplemented in a separate program file, it uses fork()
to create anew process and then inside that process it uses one of the exec
functions to start the specified program. The language that program iswritten in doesn’t make any difference in the decision about whetheror not to fork()
, so even if the “program” is a shell scriptwritten in the language understood by the current shell, a new processis created.
On the other hand, if the shell decides that the command is afunction, then it looks at the definition and invokes itdirectly. Shell functions are made up of other commands, some of whichmay result in child processes being created, but the function itselfruns in the original shell process and can therefore modify its state,for example by changing the working directory or the values ofvariables.
It is possible to force the shell to run a script directly, and not ina child process, by sourcing it. The source
command causes theshell to read the file and interpret it in the current process. Again,as with functions, the contents of the file may cause child processesto be spawned, but there is not a second shell process interpretingthe series of commands.
What Does This Mean for virtualenvwrapper?
The original and most important features of virtualenvwrapper areautomatically activating a virtualenv when it is created bymkvirtualenv
and using workon
to deactivate one environmentand activate another. Making these features work drove theimplementation decisions for the other parts of virtualenvwrapper,too.
Environments are activated interactively by sourcing bin/activate
inside the virtualenv. The activate
script does a few things, butthe important parts are setting the VIRTUAL_ENV
variable andmodifying the shell’s search path through the PATH
variable to putthe bin
directory for the environment on the front of thepath. Changing the path means that the programs installed in theenvironment, especially the python interpreter there, are found beforeother programs with the same name.
Simply running bin/activate
, without using source
doesn’t workbecause it sets up the environment of the child process, withoutaffecting the parent. In order to source the activate script in theinteractive shell, both mkvirtualenv
and workon
also need tobe run in that shell process.
Why Choose One When You Can Have Both?
The hook loader is one part of virtualenvwrapper that is written inPython. Why? Again, because it was easier. Hooks are discovered usingsetuptools entry points, because after an entry point is installed theuser doesn’t have to take any other action to allow the loader todiscover and use it. It’s easy to imagine writing a hook to create newfiles on the filesystem (by installing a package, instantiating atemplate, etc.).
How, then, do hooks running in a separate process (the Pythoninterpreter) modify the shell environment to set variables or changethe working directory? They cheat, of course.
Each hook point defined by virtualenvwrapper actually represents twohooks. First, the hooks meant to be run in Python are executed. Thenthe “source” hooks are run, and they print out a series of shellcommands. All of those commands are collected, saved to a temporaryfile, and then the shell is told to source the file.
Starting up the hook loader turns out to be way more expensive thanmost of the other actions virtualenvwrapper takes, though, so I amconsidering making its use optional. Most users customize the hooks byusing shell scripts (either globally or in the virtualenv). Findingand running those can be handled by the shell quite easily.
Implications for Cross-Shell Compatibility
Other than requests for a full-Python implementation, the other mostcommon request is to support additional shells. fish comes up a lot,as do various Windows-only shells. The officiallySupported Shells all have a common enough syntax that the sameimplementation works for each. Supporting other shells would requirerewriting much, if not all, of the logic using an alternate syntax –those other shells are basically different programming languages. Sofar I have dealt with the ports by encouraging other developers tohandle them, and then trying to link to and otherwise promote theresults.
Not As Bad As It Seems
Although there are some special challenges created by the therequirement that the commands run in a user’s interactive shell (seethe many bugs reported by users who alias common commands like rm
and cd
), using the shell as a programming language holds up quitewell. The shells are designed to make finding and executing otherprograms easy, and especially to make it easy to combine a series ofsmaller programs to perform more complicated operations. As that’swhat virtualenvwrapper is doing, it’s a natural fit.
See also
- Advanced Programming in the UNIX Environment by W. RichardStevens & Stephen A. Rago
- Fork (operating system)) on Wikipedia
- Environment variable on Wikipedia
- Linux implementation of fork()