04 The core - Running tasks in the background - 《web2py Reference Manual, 6th Edition (pre-release)》

Running tasks in the background
- Cron
- Homemade task queues

Running tasks in the background

In web2py, every HTTP request is served in its own thread. Threads are recycled for efficiency and managed by the web server. For security, the web server sets a time-out on each request. This means that actions should not run tasks that take too long, should not create new threads, and should not fork processes (it is possible but not recommended).

The proper way to run time-consuming tasks is doing it in the background. There is not a single way of doing it, but here we describe three mechanisms that are built into web2py: cron, homemade task queues, and scheduler.

By cron we refer to a web2py functionality not to the Unix Cron mechanism. The web2py cron works on windows too.

web2py cron is the way to go if you need tasks in the background at scheduled times and these tasks take a relatively short time compared to the time interval between two calls. Each task runs in its own process, and multiple tasks can run concurrently, but you have no control over how many tasks run. If accidentally one task overlaps with itself, it can cause a database lock and a spike in memory usage.

web2py scheduler takes a different approach. The number of running processes is fixed, and they can run on different machines. Each process is called a worker. Each worker picks a task when available and executes it as soon as possible after the time when it is scheduled to run, but not necessarily at that exact time. There cannot be more processes running than the number of scheduled tasks and therefore no memory spikes. Scheduler tasks can be defined in models and are stored in the database. The web2py scheduler does not implement a distributed queue since it assumes that the time to distribute tasks is negligible compared with the time to run the tasks. Workers pick up the task from the database.

Homemade tasks queues can be a simpler alternative to the web2py scheduler in some cases.

Cron

The web2py cron provides the ability for applications to execute tasks at preset times, in a platform-independent manner.

For each application, cron functionality is defined by a crontab file:

app/cron/crontab

It follows the syntax defined in ref. [cron] (with some extensions that are specific to web2py).

Before web2py 2.1.1, cron was enabled by default and could be disabled with the -N command line option. Since 2.1.1, cron is disabled by default and can be enabled by the -Y option. This change was motivated by the desire to push users toward using the new scheduler (which is superior to the cron mechanism) and also because cron may impact on performance.

This means that every application can have a separate cron configuration and that cron config can be changed from within web2py without affecting the host OS itself.

Here is an example:

0-59/1  *  *  *  *  root python /path/to/python/script.py
30      3  *  *  *  root *applications/admin/cron/db_vacuum.py
*/30    *  *  *  *  root **applications/admin/cron/something.py
@reboot root    *mycontroller/myfunction
@hourly root    *applications/admin/cron/expire_sessions.py

The last two lines in this example use extensions to regular cron syntax to provide additional web2py functionality.

The file “applications/admin/cron/expire_sessions.py” actually exists and ships with the admin app. It checks for expired sessions and deletes them. “applications/admin/cron/crontab” runs this task hourly.

If the task/script is prefixed with an asterisk (*) and ends with .py, it will be executed in the web2py environment. This means you will have all the controllers and models at your disposal. If you use two asterisks (**), the models will not be executed. This is the recommended way of calling, as it has less overhead and avoids potential locking problems.

Notice that scripts/functions executed in the web2py environment require a manual db.commit() at the end of the function or the transaction will be reverted.

web2py does not generate tickets or meaningful tracebacks in shell mode, which is how cron is run, so make sure that your web2py code runs without errors before you set it up as a cron task as you will likely not be able to see those errors when run from cron. Moreover, be careful how you use models: while the execution happens in a separate process, database locks have to be taken into account in order to avoid pages waiting for cron tasks that may be blocking the database. Use the ** syntax if you don’t need to use the database in your cron task.

You can also call a controller function, in which case there is no need to specify a path. The controller and function will be that of the invoking application. Take special care about the caveats listed above. Example:

*/30  *  *  *  *  root *mycontroller/myfunction

If you specify @reboot in the first field in the crontab file, the given task will be executed only once, at web2py startup. You can use this feature if you want to pre-cache, check, or initialize data for an application on web2py startup. Note that cron tasks are executed in parallel with the application —- if the application is not ready to serve requests until the cron task is finished, you should implement checks to reflect this. Example:

@reboot  root *mycontroller/myfunction

Depending on how you are invoking web2py, there are four modes of operation for web2py cron.

soft cron: available under all execution modes
hard cron: available if using the built-in web server (either directly or via Apache mod_proxy)
external cron: available if you have access to the system’s own cron service
No cron

The default is hard cron if you are using the built-in web server; in all other cases, the default is soft cron. Soft cron is the default method if you are using CGI, FASTCGI or WSGI (but note that soft cron is not enabled by default in the standard wsgihandler.py file provided with web2py).

With soft cron your tasks will be executed on the first call (page load) to web2py after the time specified in crontab; but only after processing the page, so no delay will be observed by the user. Obviously, there is some uncertainty regarding precisely when the task will be executed, depending on the traffic the site receives. Also, the cron task may get interrupted if the web server has a page load timeout set. If these limitations are not acceptable, see external cron. Soft cron is a reasonable last resort, but if your web server allows other cron methods, they should be preferred over soft cron.

Hard cron is the default if you are using the built-in web server (either directly or via Apache mod_proxy). Hard cron is executed in a parallel thread, so unlike soft cron, there are no limitations with regard to run time or execution time precision.

External cron is not default in any scenario, but requires you to have access to the system cron facilities. It runs in a parallel process, so none of the limitations of soft cron apply. This is the recommended way of using cron under WSGI or FASTCGI.

Example of line to add to the system crontab, (usually /etc/crontab):

0-59/1 * * * * web2py cd /var/www/web2py/ && python web2py.py -J -C -D 1 >> /tmp/cron.output 2>&1

With external cron, make sure to add either -J (or --cronjob, which is the same) as indicated above so that web2py knows that task is executed by cron. Web2py sets this internally with soft and hard cron.

Homemade task queues

While cron is useful to run tasks at regular time intervals, it is not always the best solution to run a background task. For this purpose web2py provides the ability to run any Python script as if it were inside a controller:

python web2py.py -S app -M -R applications/app/private/myscript.py -A a b c

where -S app tells web2py to run “myscript.py” as “app”, -M tells web2py to execute models, and -A a b c passes optional command line arguments:

sys.argv = ['applications/app/private/myscript.py', 'a', 'b', 'c']

to “myscript.py”.

This type of background process should not be executed via cron (except perhaps for cron @reboot) because you need to be sure that no more than one instance is running at the same time. With cron it is possible that a process starts at cron iteration 1 and is not completed by cron iteration 2, so cron starts it again, and again, and again - thus jamming the mail server.

In Chapter 8, we will provide an example of how to use the above method to send emails.