Change history for Celery 2.0
2.0.3
release-date: | 2010-08-27 12:00 P.M CEST |
---|---|
release-by: | Ask Solem |
Fixes
Worker: Properly handle connection errors happening while closing consumers.
Worker: Events are now buffered if the connection is down, then sent when the connection is re-established.
No longer depends on the mailer package.
This package had a name space collision with django-mailer, so its functionality was replaced.
Redis result backend: Documentation typos: Redis doesn’t have database names, but database numbers. The default database is now 0.
inspect: registered_tasks was requesting an invalid command because of a typo.
See issue #170.
CELERY_ROUTES: Values defined in the route should now have precedence over values defined in CELERY_QUEUES when merging the two.
With the follow settings:
CELERY_QUEUES = {"cpubound": {"exchange": "cpubound",
"routing_key": "cpubound"}}
CELERY_ROUTES = {"tasks.add": {"queue": "cpubound",
"routing_key": "tasks.add",
"serializer": "json"}}
The final routing options for tasks.add will become:
{"exchange": "cpubound",
"routing_key": "tasks.add",
"serializer": "json"}
This was not the case before: the values in CELERY_QUEUES would take precedence.
Worker crashed if the value of CELERY_TASK_ERROR_WHITELIST was not an iterable
apply(): Make sure kwargs[“task_id”] is always set.
AsyncResult.traceback: Now returns None, instead of raising KeyError if traceback is missing.
inspect: Replies did not work correctly if no destination was specified.
Can now store result/metadata for custom states.
Worker: A warning is now emitted if the sending of task error emails fails.
celeryev: Curses monitor no longer crashes if the terminal window is resized.
See issue #160.
Worker: On OS X it is not possible to run os.exec* in a process that is threaded.
This breaks the SIGHUP restart handler, and is now disabled on OS X, emitting a warning instead.
See issue #152.
celery.execute.trace: Properly handle raise(str), which is still allowed in Python 2.4.
See issue #175.
Using urllib2 in a periodic task on OS X crashed because of the proxy auto detection used in OS X.
This is now fixed by using a workaround. See issue #143.
Debian init scripts: Commands should not run in a sub shell
See issue #163.
Debian init scripts: Use the absolute path of celeryd program to allow stat
See issue #162.
Documentation
getting-started/broker-installation: Fixed typo
set_permissions “” -> set_permissions ”.*”.
Tasks User Guide: Added section on database transactions.
See issue #169.
Routing User Guide: Fixed typo “feed”: -> {“queue”: “feeds”}.
See issue #169.
Documented the default values for the CELERYD_CONCURRENCY and CELERYD_PREFETCH_MULTIPLIER settings.
Tasks User Guide: Fixed typos in the subtask example
celery.signals: Documented worker_process_init.
Daemonization cookbook: Need to export DJANGO_SETTINGS_MODULE in /etc/default/celeryd.
Added some more FAQs from stack overflow
Daemonization cookbook: Fixed typo CELERYD_LOGFILE/CELERYD_PIDFILE
to CELERYD_LOG_FILE / CELERYD_PID_FILE
Also added troubleshooting section for the init scripts.
2.0.2
release-date: | 2010-07-22 11:31 A.M CEST |
---|---|
release-by: | Ask Solem |
Routes: When using the dict route syntax, the exchange for a task could disappear making the task unroutable.
See issue #158.
Test suite now passing on Python 2.4
No longer have to type PYTHONPATH=. to use celeryconfig in the current directory.
This is accomplished by the default loader ensuring that the current directory is in sys.path when loading the config module. sys.path is reset to its original state after loading.
Adding the current working directory to sys.path without the user knowing may be a security issue, as this means someone can drop a Python module in the users directory that executes arbitrary commands. This was the original reason not to do this, but if done only when loading the config module, this means that the behavior will only apply to the modules imported in the config module, which I think is a good compromise (certainly better than just explicitly setting PYTHONPATH=. anyway)
Experimental Cassandra backend added.
Worker: SIGHUP handler accidentally propagated to worker pool processes.
In combination with 7a7c44e39344789f11b5346e9cc8340f5fe4846c this would make each child process start a new worker instance when the terminal window was closed :/
Worker: Do not install SIGHUP handler if running from a terminal.
This fixes the problem where the worker is launched in the background when closing the terminal.
Worker: Now joins threads at shutdown.
See issue #152.
Test tear down: Don’t use atexit but nose’s teardown() functionality instead.
See issue #154.
Debian worker init script: Stop now works correctly.
Task logger: warn method added (synonym for warning)
Can now define a white list of errors to send error emails for.
Example:
CELERY_TASK_ERROR_WHITELIST = ('myapp.MalformedInputError')
See issue #153.
Worker: Now handles overflow exceptions in time.mktime while parsing the ETA field.
LoggerWrapper: Try to detect loggers logging back to stderr/stdout making an infinite loop.
Added celery.task.control.inspect: Inspects a running worker.
Examples:
# Inspect a single worker
>>> i = inspect("myworker.example.com")
# Inspect several workers
>>> i = inspect(["myworker.example.com", "myworker2.example.com"])
# Inspect all workers consuming on this vhost.
>>> i = inspect()
### Methods
# Get currently executing tasks
>>> i.active()
# Get currently reserved tasks
>>> i.reserved()
# Get the current eta schedule
>>> i.scheduled()
# Worker statistics and info
>>> i.stats()
# List of currently revoked tasks
>>> i.revoked()
# List of registered tasks
>>> i.registered_tasks()
Remote control commands dump_active/dump_reserved/dump_schedule now replies with detailed task requests.
Containing the original arguments and fields of the task requested.
In addition the remote control command set_loglevel has been added, this only changes the log level for the main process.
Worker control command execution now catches errors and returns their string representation in the reply.
Functional test suite added
celery.tests.functional.case contains utilities to start and stop an embedded worker process, for use in functional testing.
2.0.1
release-date: | 2010-07-09 03:02 P.M CEST |
---|---|
release-by: | Ask Solem |
multiprocessing.pool: Now handles encoding errors, so that pickling errors doesn’t crash the worker processes.
The remote control command replies was not working with RabbitMQ 1.8.0’s stricter equivalence checks.
If you’ve already hit this problem you may have to delete the declaration:
$ camqadm exchange.delete celerycrq
or:
$ python manage.py camqadm exchange.delete celerycrq
A bug sneaked in the ETA scheduler that made it only able to execute one task per second(!)
The scheduler sleeps between iterations so it doesn’t consume too much CPU. It keeps a list of the scheduled items sorted by time, at each iteration it sleeps for the remaining time of the item with the nearest deadline. If there are no eta tasks it will sleep for a minimum amount of time, one second by default.
A bug sneaked in here, making it sleep for one second for every task that was scheduled. This has been fixed, so now it should move tasks like hot knife through butter.
In addition a new setting has been added to control the minimum sleep interval; CELERYD_ETA_SCHEDULER_PRECISION. A good value for this would be a float between 0 and 1, depending on the needed precision. A value of 0.8 means that when the ETA of a task is met, it will take at most 0.8 seconds for the task to be moved to the ready queue.
Pool: Supervisor did not release the semaphore.
This would lead to a deadlock if all workers terminated prematurely.
Added Python version trove classifiers: 2.4, 2.5, 2.6 and 2.7
Tests now passing on Python 2.7.
Task.__reduce__: Tasks created using the task decorator can now be pickled.
setup.py: nose added to tests_require.
Pickle should now work with SQLAlchemy 0.5.x
New homepage design by Jan Henrik Helmers: http://celeryproject.org
New Sphinx theme by Armin Ronacher: http://docs.celeryproject.org/
Fixed “pending_xref” errors shown in the HTML rendering of the documentation. Apparently this was caused by new changes in Sphinx 1.0b2.
Router classes in CELERY_ROUTES are now imported lazily.
Importing a router class in a module that also loads the Celery environment would cause a circular dependency. This is solved by importing it when needed after the environment is set up.
CELERY_ROUTES was broken if set to a single dict.
This example in the docs should now work again:
CELERY_ROUTES = {"feed.tasks.import_feed": "feeds"}
CREATE_MISSING_QUEUES was not honored by apply_async.
New remote control command: stats
Dumps information about the worker, like pool process ids, and total number of tasks executed by type.
Example reply:
[{'worker.local':
'total': {'tasks.sleeptask': 6},
'pool': {'timeouts': [None, None],
'processes': [60376, 60377],
'max-concurrency': 2,
'max-tasks-per-child': None,
'put-guarded-by-semaphore': True}}]
New remote control command: dump_active
Gives a list of tasks currently being executed by the worker. By default arguments are passed through repr in case there are arguments that is not JSON encodable. If you know the arguments are JSON safe, you can pass the argument safe=True.
Example reply:
>>> broadcast("dump_active", arguments={"safe": False}, reply=True)
[{'worker.local': [
{'args': '(1,)',
'time_start': 1278580542.6300001,
'name': 'tasks.sleeptask',
'delivery_info': {
'consumer_tag': '30',
'routing_key': 'celery',
'exchange': 'celery'},
'hostname': 'casper.local',
'acknowledged': True,
'kwargs': '{}',
'id': '802e93e9-e470-47ed-b913-06de8510aca2',
}
]}]
Added experimental support for persistent revokes.
Use the -S|–statedb argument to the worker to enable it:
$ celeryd --statedb=/var/run/celeryd
This will use the file: /var/run/celeryd.db, as the shelve module automatically adds the .db suffix.
2.0.0
release-date: | 2010-07-02 02:30 P.M CEST |
---|---|
release-by: | Ask Solem |
Foreword
Celery 2.0 contains backward incompatible changes, the most important being that the Django dependency has been removed so Celery no longer supports Django out of the box, but instead as an add-on package called django-celery.
We’re very sorry for breaking backwards compatibility, but there’s also many new and exciting features to make up for the time you lose upgrading, so be sure to read the News section.
Quite a lot of potential users have been upset about the Django dependency, so maybe this is a chance to get wider adoption by the Python community as well.
Big thanks to all contributors, testers and users!
Upgrading for Django-users
Django integration has been moved to a separate package: django-celery.
To upgrade you need to install the django-celery module and change:
INSTALLED_APPS = "celery"
to:
INSTALLED_APPS = "djcelery"
If you use mod_wsgi you need to add the following line to your .wsgi file:
import os
os.environ["CELERY_LOADER"] = "django"
The following modules has been moved to django-celery:
Module name Replace with celery.models djcelery.models celery.managers djcelery.managers celery.views djcelery.views celery.urls djcelery.urls celery.management djcelery.management celery.loaders.djangoapp djcelery.loaders celery.backends.database djcelery.backends.database celery.backends.cache djcelery.backends.cache
Importing djcelery will automatically setup Celery to use Django loader. loader. It does this by setting the CELERY_LOADER environment variable to “django” (it won’t change it if a loader is already set.)
When the Django loader is used, the “database” and “cache” result backend aliases will point to the djcelery backends instead of the built-in backends, and configuration will be read from the Django settings.
Upgrading for others
Database result backend
The database result backend is now using SQLAlchemy instead of the Django ORM, see Supported Databases for a table of supported databases.
The DATABASE_* settings has been replaced by a single setting: CELERY_RESULT_DBURI. The value here should be an SQLAlchemy Connection String, some examples include:
# sqlite (filename)
CELERY_RESULT_DBURI = "sqlite:///celerydb.sqlite"
# mysql
CELERY_RESULT_DBURI = "mysql://scott:[email protected]/foo"
# postgresql
CELERY_RESULT_DBURI = "postgresql://scott:[email protected]/mydatabase"
# oracle
CELERY_RESULT_DBURI = "oracle://scott:[email protected]:1521/sidname"
See SQLAlchemy Connection Strings for more information about connection strings.
To specify additional SQLAlchemy database engine options you can use the CELERY_RESULT_ENGINE_OPTIONS setting:
# echo enables verbose logging from SQLAlchemy.
CELERY_RESULT_ENGINE_OPTIONS = {"echo": True}
Cache result backend
The cache result backend is no longer using the Django cache framework, but it supports mostly the same configuration syntax:
CELERY_CACHE_BACKEND = "memcached://A.example.com:11211;B.example.com"
To use the cache backend you must either have the pylibmc or python-memcached library installed, of which the former is regarded as the best choice.
The support backend types are memcached:// and memory://, we haven’t felt the need to support any of the other backends provided by Django.
Backward incompatible changes
Default (python) loader now prints warning on missing celeryconfig.py instead of raising ImportError.
The worker raises ImproperlyConfigured if the configuration is not set up. This makes it possible to use –help etc., without having a working configuration.
Also this makes it possible to use the client side of celery without being configured:
>>> from carrot.connection import BrokerConnection
>>> conn = BrokerConnection("localhost", "guest", "guest", "/")
>>> from celery.execute import send_task
>>> r = send_task("celery.ping", args=(), kwargs={}, connection=conn)
>>> from celery.backends.amqp import AMQPBackend
>>> r.backend = AMQPBackend(connection=conn)
>>> r.get()
'pong'
The following deprecated settings has been removed (as scheduled by the Celery Deprecation Timeline):
Setting name Replace with CELERY_AMQP_CONSUMER_QUEUES CELERY_QUEUES CELERY_AMQP_EXCHANGE CELERY_DEFAULT_EXCHANGE CELERY_AMQP_EXCHANGE_TYPE CELERY_DEFAULT_EXCHANGE_TYPE CELERY_AMQP_CONSUMER_ROUTING_KEY CELERY_QUEUES CELERY_AMQP_PUBLISHER_ROUTING_KEY CELERY_DEFAULT_ROUTING_KEY The celery.task.rest module has been removed, use celery.task.http instead (as scheduled by the Celery Deprecation Timeline).
It’s no longer allowed to skip the class name in loader names. (as scheduled by the Celery Deprecation Timeline):
Assuming the implicit Loader class name is no longer supported, if you use e.g.:
CELERY_LOADER = "myapp.loaders"
You need to include the loader class name, like this:
CELERY_LOADER = "myapp.loaders.Loader"
CELERY_TASK_RESULT_EXPIRES now defaults to 1 day.
Previous default setting was to expire in 5 days.
AMQP backend: Don’t use different values for auto_delete.
This bug became visible with RabbitMQ 1.8.0, which no longer allows conflicting declarations for the auto_delete and durable settings.
If you’ve already used celery with this backend chances are you have to delete the previous declaration:
$ camqadm exchange.delete celeryresults
Now uses pickle instead of cPickle on Python versions <= 2.5
cPickle is broken in Python <= 2.5.
It unsafely and incorrectly uses relative instead of absolute imports, so e.g.:
exceptions.KeyError
becomes:
celery.exceptions.KeyError
Your best choice is to upgrade to Python 2.6, as while the pure pickle version has worse performance, it is the only safe option for older Python versions.
News
celeryev: Curses Celery Monitor and Event Viewer.
This is a simple monitor allowing you to see what tasks are executing in real-time and investigate tracebacks and results of ready tasks. It also enables you to set new rate limits and revoke tasks.
Screenshot:
If you run celeryev with the -d switch it will act as an event dumper, simply dumping the events it receives to standard out:
$ celeryev -d
-> celeryev: starting capture...
casper.local [2010-06-04 10:42:07.020000] heartbeat
casper.local [2010-06-04 10:42:14.750000] task received:
tasks.add(61a68756-27f4-4879-b816-3cf815672b0e) args=[2, 2] kwargs={}
eta=2010-06-04T10:42:16.669290, retries=0
casper.local [2010-06-04 10:42:17.230000] task started
tasks.add(61a68756-27f4-4879-b816-3cf815672b0e) args=[2, 2] kwargs={}
casper.local [2010-06-04 10:42:17.960000] task succeeded:
tasks.add(61a68756-27f4-4879-b816-3cf815672b0e)
args=[2, 2] kwargs={} result=4, runtime=0.782663106918
The fields here are, in order: *sender hostname*, *timestamp*, *event type* and
*additional event fields*.
AMQP result backend: Now supports .ready(), .successful(), .result, .status, and even responds to changes in task state
New user guides:
Worker: Standard out/error is now being redirected to the log file.
billiard has been moved back to the celery repository.
Module name celery equivalent billiard.pool celery.concurrency.processes.pool billiard.serialization celery.serialization billiard.utils.functional celery.utils.functional The billiard distribution may be maintained, depending on interest.
now depends on carrot >= 0.10.5
now depends on pyparsing
Worker: Added –purge as an alias to –discard.
Worker: Ctrl+C (SIGINT) once does warm shutdown, hitting Ctrl+C twice forces termination.
Added support for using complex crontab-expressions in periodic tasks. For example, you can now use:
>>> crontab(minute="*/15")
or even:
>>> crontab(minute="*/30", hour="8-17,1-2", day_of_week="thu-fri")
See Periodic Tasks.
Worker: Now waits for available pool processes before applying new tasks to the pool.
This means it doesn’t have to wait for dozens of tasks to finish at shutdown because it has applied prefetched tasks without having any pool processes available to immediately accept them.
See issue #122.
New built-in way to do task callbacks using subtask.
See Canvas: Designing Workflows for more information.
TaskSets can now contain several types of tasks.
TaskSet has been refactored to use a new syntax, please see Canvas: Designing Workflows for more information.
The previous syntax is still supported, but will be deprecated in version 1.4.
TaskSet failed() result was incorrect.
See issue #132.
Now creates different loggers per task class.
See issue #129.
Missing queue definitions are now created automatically.
You can disable this using the CELERY_CREATE_MISSING_QUEUES setting.
The missing queues are created with the following options:
CELERY_QUEUES[name] = {"exchange": name,
"exchange_type": "direct",
"routing_key": "name}
This feature is added for easily setting up routing using the -Q option to the worker:
$ celeryd -Q video, image
See the new routing section of the User Guide for more information: Routing Tasks.
New Task option: Task.queue
If set, message options will be taken from the corresponding entry in CELERY_QUEUES. exchange, exchange_type and routing_key will be ignored
Added support for task soft and hard time limits.
New settings added:
-
Hard time limit. The worker processing the task will be killed and replaced with a new one when this is exceeded.
-
Soft time limit. The SoftTimeLimitExceeded exception will be raised when this is exceeded. The task can catch this to e.g. clean up before the hard time limit comes.
New command-line arguments to celeryd added: –time-limit and –soft-time-limit.
What’s left?
This won’t work on platforms not supporting signals (and specifically the SIGUSR1 signal) yet. So an alternative the ability to disable the feature all together on nonconforming platforms must be implemented.
Also when the hard time limit is exceeded, the task result should be a TimeLimitExceeded exception.
-
Test suite is now passing without a running broker, using the carrot in-memory backend.
Log output is now available in colors.
Log level Color DEBUG Blue WARNING Yellow CRITICAL Magenta ERROR Red This is only enabled when the log output is a tty. You can explicitly enable/disable this feature using the CELERYD_LOG_COLOR setting.
Added support for task router classes (like the django multi-db routers)
- New setting: CELERY_ROUTES
This is a single, or a list of routers to traverse when sending tasks. Dictionaries in this list converts to a celery.routes.MapRoute instance.
Examples:
>>> CELERY_ROUTES = {"celery.ping": "default",
"mytasks.add": "cpu-bound",
"video.encode": {
"queue": "video",
"exchange": "media"
"routing_key": "media.video.encode"}}
>>> CELERY_ROUTES = ("myapp.tasks.Router",
{"celery.ping": "default})
Where myapp.tasks.Router could be:
class Router(object):
def route_for_task(self, task, args=None, kwargs=None):
if task == "celery.ping":
return "default"
route_for_task may return a string or a dict. A string then means it’s a queue name in CELERY_QUEUES, a dict means it’s a custom route.
When sending tasks, the routers are consulted in order. The first router that doesn’t return None is the route to use. The message options is then merged with the found route settings, where the routers settings have priority.
Example if apply_async() has these arguments:
>>> Task.apply_async(immediate=False, exchange="video",
... routing_key="video.compress")
and a router returns:
{"immediate": True,
"exchange": "urgent"}
the final message options will be:
immediate=True, exchange="urgent", routing_key="video.compress"
(and any default message options defined in the Task class)
New Task handler called after the task returns: after_return().
ExceptionInfo now passed to
on_retry()/ on_failure() as einfo keyword argument.
Worker: Added CELERYD_MAX_TASKS_PER_CHILD / --maxtasksperchild
Defines the maximum number of tasks a pool worker can process before the process is terminated and replaced by a new one.
Revoked tasks now marked with state REVOKED, and result.get() will now raise TaskRevokedError.
celery.task.control.ping() now works as expected.
apply(throw=True) / CELERY_EAGER_PROPAGATES_EXCEPTIONS: Makes eager execution re-raise task errors.
New signal: ~celery.signals.worker_process_init: Sent inside the pool worker process at init.
Worker: -Q option: Ability to specify list of queues to use, disabling other configured queues.
For example, if CELERY_QUEUES defines four queues: image, video, data and default, the following command would make the worker only consume from the image and video queues:
$ celeryd -Q image,video
Worker: New return value for the revoke control command:
Now returns:
{"ok": "task $id revoked"}
instead of True.
Worker: Can now enable/disable events using remote control
Example usage:
>>> from celery.task.control import broadcast
>>> broadcast("enable_events")
>>> broadcast("disable_events")
Removed top-level tests directory. Test config now in celery.tests.config
This means running the unit tests doesn’t require any special setup. celery/tests/__init__ now configures the CELERY_CONFIG_MODULE and CELERY_LOADER environment variables, so when nosetests imports that, the unit test environment is all set up.
Before you run the tests you need to install the test requirements:
$ pip install -r requirements/test.txt
Running all tests:
$ nosetests
Specifying the tests to run:
$ nosetests celery.tests.test_task
Producing HTML coverage:
$ nosetests --with-coverage3
The coverage output is then located in celery/tests/cover/index.html.
Worker: New option –version: Dump version info and exit.
celeryd-multi: Tool for shell scripts to start multiple workers.
Some examples:
# Advanced example with 10 workers:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celeryd-multi start 10 -l INFO -Q:1-3 images,video -Q:4,5:data
-Q default -L:4,5 DEBUG
# get commands to start 10 workers, with 3 processes each
$ celeryd-multi start 3 -c 3
celeryd -n celeryd1.myhost -c 3
celeryd -n celeryd2.myhost -c 3
celeryd -n celeryd3.myhost -c 3
# start 3 named workers
$ celeryd-multi start image video data -c 3
celeryd -n image.myhost -c 3
celeryd -n video.myhost -c 3
celeryd -n data.myhost -c 3
# specify custom hostname
$ celeryd-multi start 2 -n worker.example.com -c 3
celeryd -n celeryd1.worker.example.com -c 3
celeryd -n celeryd2.worker.example.com -c 3
# Additionl options are added to each celeryd',
# but you can also modify the options for ranges of or single workers
# 3 workers: Two with 3 processes, and one with 10 processes.
$ celeryd-multi start 3 -c 3 -c:1 10
celeryd -n celeryd1.myhost -c 10
celeryd -n celeryd2.myhost -c 3
celeryd -n celeryd3.myhost -c 3
# can also specify options for named workers
$ celeryd-multi start image video data -c 3 -c:image 10
celeryd -n image.myhost -c 10
celeryd -n video.myhost -c 3
celeryd -n data.myhost -c 3
# ranges and lists of workers in options is also allowed:
# (-c:1-3 can also be written as -c:1,2,3)
$ celeryd-multi start 5 -c 3 -c:1-3 10
celeryd-multi -n celeryd1.myhost -c 10
celeryd-multi -n celeryd2.myhost -c 10
celeryd-multi -n celeryd3.myhost -c 10
celeryd-multi -n celeryd4.myhost -c 3
celeryd-multi -n celeryd5.myhost -c 3
# lists also works with named workers
$ celeryd-multi start foo bar baz xuzzy -c 3 -c:foo,bar,baz 10
celeryd-multi -n foo.myhost -c 10
celeryd-multi -n bar.myhost -c 10
celeryd-multi -n baz.myhost -c 10
celeryd-multi -n xuzzy.myhost -c 3
The worker now calls the result backends process_cleanup method after task execution instead of before.
AMQP result backend now supports Pika.