Deployment recipes: Infrastructure

There are multiple ways to deploy web2py in a production environment. The details depend on the configuration and the services provided by the host.

In this chapter we consider the following issues:

  • Production deployment (Apache, Nginx, Lighttpd, Cherokee)
  • Security
  • Scalability using Redis and a load balancer.
  • Deployment on PythonAnywhere, Heroku, Amazon EC2, and on the Google App Engine platform(GAE[gae] )

web2py comes with an SSL[ssl] enabled web server, the Rocket wsgiserver[rocket] . While this is a fast web server, it has limited configuration capabilities. For this reason it is best to deploy web2py behind Apache[apache] , Nginx[nginx] Lighttpd[lighttpd] he or Cherokee[cherokee] . These are free and open-source web servers that are customizable and have been proven to be reliable in high traffic production environments. They can be configured to serve static files directly, deal with HTTPS, and pass control to web2py for dynamic content.

Until a few years ago, the standard interface for communication between web servers and web applications was the Common Gateway Interface (CGI)[cgi] . The main problem with CGI is that it creates a new process for each HTTP request. If the web application is written in an interpreted language, each HTTP request served by the CGI scripts starts a new instance of the interpreter. This is slow, and it should be avoided in a production environment. Moreover, CGI can only handle simple responses. It cannot handle, for example, file streaming.

web2py provides a file cgihandler.py to interface to CGI.

One solution to this problem is to use the mod_python module for Apache. We discuss it here because its use is still very common, though the mod_python project has officially been abandoned by the Apache Software Foundation. mod_python starts one instance of the Python interpreter when Apache starts, and serves each HTTP request in its own thread without having to restart Python each time. This is a better solution than CGI, but it is not an optimal solution, since mod_python uses its own interface for communication between the web server and the web application. In mod_python, all hosted applications run under the same user-id/group-id, which presents security issues.

web2py provides a file modpythonhandler.py to interface to mod_python.

In the last few years, the Python community has come together behind a new standard interface for communication between web servers and web applications written in Python. It is called Web Server Gateway Interface (WSGI)[wsgi-w] [wsgi-o] . web2py was built on WSGI, and it provides handlers for using other interfaces when WSGI is not available.

Apache supports WSGI via the module mod_wsgi[modwsgi] developed by Graham Dumpleton.

web2py provides a file wsgihandler.py to interface to WSGI.

Some web hosting services do not support mod_wsgi. In this case, we must use Apache as a proxy and forward all incoming requests to the web2py built-in web server (running for example on localhost:8000).

In both cases, with mod_wsgi and/or mod_proxy, Apache can be configured to serve static files and deal with SSL encryption directly, taking the burden off web2py.

Nginx uses uWSGI instead of WSGI, a similar but different protocol which requires its own python adapter.

The Lighttpd web server does not currently support the WSGI interface, but it does support the FastCGI[fastcgi] interface, which is an improvement over CGI. FastCGI’s main aim is to reduce the overhead associated with interfacing the web server and CGI programs, allowing a server to handle more HTTP requests at once.

According to the Lighttpd web site, “Lighttpd powers several popular Web 2.0 sites such as YouTube and Wikipedia. Its high speed IO-infrastructure allows them to scale several times better with the same hardware than with alternative web-servers”. Lighttpd with FastCGI is, in fact, faster than Apache with mod_wsgi.

web2py provides a file fcgihandler.py to interface to FastCGI.

web2py also includes a gaehandler.py to interface with the Google App Engine (GAE). On GAE, web applications run “in the cloud”. This means that the framework completely abstracts any hardware details. The web application is automatically replicated as many times as necessary to serve all concurrent requests. Replication in this case means more than multiple threads on a single server; it also means multiple processes on different servers. GAE achieves this level of scalability by blocking write access to the file system, and all persistent information must be stored in the Google BigTable datastore or in memcache.

On non-GAE platforms, scalability is an issue that needs to be addressed, and it may require some tweaks in the web2py applications. The most common way to achieve scalability is by using multiple web servers behind a load-balancer (a simple round robin, or something more sophisticated, receiving heartbeat feedback from the servers).

Even if there are multiple web servers, there must be one, and only one, database server. By default, web2py uses the file system for storing sessions, error tickets, uploaded files, and the cache. This means that in the default configuration, the corresponding folders have to be shared folders.

image

In the rest of the chapter, we consider various recipes that may provide an improvement over this naive approach, including:

  • Store sessions in the database, in cache or do not store sessions at all.
  • Store tickets on local filesystem and move them into the database in batches.
  • Use memcache instead of cache.ram and cache.disk.
  • Store uploaded files in the database instead of the shared filesystem.

While we recommend following the first three recipes, the fourth recipe may provide an advantage mainly in the case of small files, but may be counterproductive for large files.

anyserver.py

Web2py comes with a file called anyserver.py that implements WSGI interfaces to the following popular servers: bjoern, cgi, cherrypy, diesel, eventlet, fapws, flup, gevent, gunicorn, mongrel2, paste, rocket, tornado, twisted, wsgiref

You can use any of these servers, for example Tornado, simply by doing:

  1. python anyserver.py -s tornado -i 127.0.0.1 -p 8000 -l -P

Here -l is for logging and -P is for the profiler. For information on all the command line options use “-h”:

  1. python anyserver.py -h