Extensions and Bootsteps

Custom Message Consumers

You may want to embed custom Kombu consumers to manually process your messages.

For that purpose a special ConsumerStep bootstep class exists, where you only need to define the get_consumers method, which must return a list of kombu.Consumer objects to start whenever the connection is established:

  1. from celery import Celery
  2. from celery import bootsteps
  3. from kombu import Consumer, Exchange, Queue
  4. my_queue = Queue('custom', Exchange('custom'), 'routing_key')
  5. app = Celery(broker='amqp://')
  6. class MyConsumerStep(bootsteps.ConsumerStep):
  7. def get_consumers(self, channel):
  8. return [Consumer(channel,
  9. queues=[my_queue],
  10. callbacks=[self.handle_message],
  11. accept=['json'])]
  12. def handle_message(self, body, message):
  13. print('Received message: {0!r}'.format(body))
  14. message.ack()
  15. app.steps['consumer'].add(MyConsumerStep)
  16. def send_me_a_message(self, who='world!', producer=None):
  17. with app.producer_or_acquire(producer) as producer:
  18. producer.send(
  19. {'hello': who},
  20. serializer='json',
  21. exchange=my_queue.exchange,
  22. routing_key='routing_key',
  23. declare=[my_queue],
  24. retry=True,
  25. )
  26. if __name__ == '__main__':
  27. send_me_a_message('celery')

注解

Kombu Consumers can take use of two different message callback dispatching mechanisms. The first one is the callbacks argument which accepts a list of callbacks with a (body, message) signature, the second one is the on_message argument which takes a single callback with a (message, ) signature. The latter will not automatically decode and deserialize the payload which is useful in many cases:

  1. def get_consumers(self, channel):
  2. return [Consumer(channel, queues=[my_queue],
  3. on_message=self.on_message)]
  4. def on_message(self, message):
  5. payload = message.decode()
  6. print(
  7. 'Received message: {0!r} {props!r} rawlen={s}'.format(
  8. payload, props=message.properties, s=len(message.body),
  9. ))
  10. message.ack()

Blueprints

Bootsteps is a technique to add functionality to the workers. A bootstep is a custom class that defines hooks to do custom actions at different stages in the worker. Every bootstep belongs to a blueprint, and the worker currently defines two blueprints: Worker, and Consumer


Figure A: Bootsteps in the Worker and Consumer blueprints. Starting

from the bottom up the first step in the worker blueprint is the Timer, and the last step is to start the Consumer blueprint, which then establishes the broker connection and starts consuming messages.

../_images/worker_graph_full.png


Worker

The Worker is the first blueprint to start, and with it starts major components like the event loop, processing pool, the timer, and also optional components like the autoscaler. When the worker is fully started it will continue to the Consumer blueprint.

The WorkController is the core worker implementation, and contains several methods and attributes that you can use in your bootstep.

Attributes

app

The current app instance.

hostname

The workers node name (e.g. worker1@example.com)

blueprint

This is the worker Blueprint.

hub

Event loop object (Hub). You can use this to register callbacks in the event loop.

This is only supported by async I/O enabled transports (amqp, redis), in which case the worker.use_eventloop attribute should be set.

Your bootstep must require the Hub bootstep to use this.

pool

The current process/eventlet/gevent/thread pool. See celery.concurrency.base.BasePool.

Your bootstep must require the Pool bootstep to use this.

timer

Timer used to schedule functions.

Your bootstep must require the Timer bootstep to use this.

statedb

Database <celery.worker.state.Persistent>` to persist state between worker restarts.

This only exists if the statedb argument is enabled. Your bootstep must require the Statedb bootstep to use this.

autoscaler

Autoscaler used to automatically grow and shrink the number of processes in the pool.

This only exists if the autoscale argument is enabled. Your bootstep must require the Autoscaler bootstep to use this.

autoreloader

Autoreloader used to automatically reload use code when the filesystem changes.

This only exists if the autoreload argument is enabled. Your bootstep must require the Autoreloader bootstep to use this.

An example Worker bootstep could be:

  1. from celery import bootsteps
  2. class ExampleWorkerStep(bootsteps.StartStopStep):
  3. requires = ('Pool', )
  4. def __init__(self, worker, **kwargs):
  5. print('Called when the WorkController instance is constructed')
  6. print('Arguments to WorkController: {0!r}'.format(kwargs))
  7. def create(self, worker):
  8. # this method can be used to delegate the action methods
  9. # to another object that implements ``start`` and ``stop``.
  10. return self
  11. def start(self, worker):
  12. print('Called when the worker is started.')
  13. def stop(self, worker):
  14. print("Called when the worker shuts down.")
  15. def terminate(self, worker):
  16. print("Called when the worker terminates")

Every method is passed the current WorkController instance as the first argument.

Another example could use the timer to wake up at regular intervals:

  1. from celery import bootsteps
  2. class DeadlockDetection(bootsteps.StartStopStep):
  3. requires = ('Timer', )
  4. def __init__(self, worker, deadlock_timeout=3600):
  5. self.timeout = deadlock_timeout
  6. self.requests = []
  7. self.tref = None
  8. def start(self, worker):
  9. # run every 30 seconds.
  10. self.tref = worker.timer.call_repeatedly(
  11. 30.0, self.detect, (worker, ), priority=10,
  12. )
  13. def stop(self, worker):
  14. if self.tref:
  15. self.tref.cancel()
  16. self.tref = None
  17. def detect(self, worker):
  18. # update active requests
  19. for req in self.worker.active_requests:
  20. if req.time_start and time() - req.time_start > self.timeout:
  21. raise SystemExit()

Consumer

The Consumer blueprint establishes a connection to the broker, and is restarted every time this connection is lost. Consumer bootsteps include the worker heartbeat, the remote control command consumer, and importantly, the task consumer.

When you create consumer bootsteps you must take into account that it must be possible to restart your blueprint. An additional ‘shutdown’ method is defined for consumer bootsteps, this method is called when the worker is shutdown.

Attributes

app

The current app instance.

controller

The parent WorkController object that created this consumer.

hostname

The workers node name (e.g. worker1@example.com)

blueprint

This is the worker Blueprint.

hub

Event loop object (Hub). You can use this to register callbacks in the event loop.

This is only supported by async I/O enabled transports (amqp, redis), in which case the worker.use_eventloop attribute should be set.

Your bootstep must require the Hub bootstep to use this.

connection

The current broker connection (kombu.Connection).

Your bootstep must require the ‘Connection’ bootstep to use this.

event_dispatcher

A celery.events.Dispatcher object that can be used to send events.

Your bootstep must require the Events bootstep to use this.

gossip

Worker to worker broadcast communication (class:~celery.worker.consumer.Gossip).

pool

The current process/eventlet/gevent/thread pool. See celery.concurrency.base.BasePool.

timer

Timer <celery.utils.timer2.Schedule used to schedule functions.

heart

Responsible for sending worker event heartbeats (Heart).

Your bootstep must require the Heart bootstep to use this.

task_consumer

The kombu.Consumer object used to consume task messages.

Your bootstep must require the Tasks bootstep to use this.

strategies

Every registered task type has an entry in this mapping, where the value is used to execute an incoming message of this task type (the task execution strategy). This mapping is generated by the Tasks bootstep when the consumer starts:

  1. for name, task in app.tasks.items():
  2. strategies[name] = task.start_strategy(app, consumer)
  3. task.__trace__ = celery.app.trace.build_tracer(
  4. name, task, loader, hostname
  5. )

Your bootstep must require the Tasks bootstep to use this.

task_buckets

A defaultdict used to lookup the rate limit for a task by type. Entries in this dict may be None (for no limit) or a TokenBucket instance implementing consume(tokens) and expected_time(tokens).

TokenBucket implements the token bucket algorithm, but any algorithm may be used as long as it conforms to the same interface and defines the two methods above.

qos

The QoS object can be used to change the task channels current prefetch_count value, e.g:

  1. # increment at next cycle
  2. consumer.qos.increment_eventually(1)
  3. # decrement at next cycle
  4. consumer.qos.decrement_eventually(1)
  5. consumer.qos.set(10)

Methods

consumer.reset_rate_limits()

Updates the task_buckets mapping for all registered task types.

consumer.bucket_for_task(type, Bucket=TokenBucket)

Creates rate limit bucket for a task using its task.rate_limit attribute.

consumer.add_task_queue(name, exchange=None, exchange_type=None,

routing_key=None, **options):

Adds new queue to consume from. This will persist on connection restart.

consumer.cancel_task_queue(name)

Stop consuming from queue by name. This will persist on connection restart.

apply_eta_task(request)

Schedule eta task to execute based on the request.eta attribute. (Request)

Installing Bootsteps

app.steps[‘worker’] and app.steps[‘consumer’] can be modified to add new bootsteps:

  1. >>> app = Celery()
  2. >>> app.steps['worker'].add(MyWorkerStep) # < add class, do not instantiate
  3. >>> app.steps['consumer'].add(MyConsumerStep)
  4. >>> app.steps['consumer'].update([StepA, StepB])
  5. >>> app.steps['consumer']
  6. {step:proj.StepB{()}, step:proj.MyConsumerStep{()}, step:proj.StepA{()}

The order of steps is not important here as the order is decided by the resulting dependency graph (Step.requires).

To illustrate how you can install bootsteps and how they work, this is an example step that prints some useless debugging information. It can be added both as a worker and consumer bootstep:

  1. from celery import Celery
  2. from celery import bootsteps
  3. class InfoStep(bootsteps.Step):
  4. def __init__(self, parent, **kwargs):
  5. # here we can prepare the Worker/Consumer object
  6. # in any way we want, set attribute defaults and so on.
  7. print('{0!r} is in init'.format(parent))
  8. def start(self, parent):
  9. # our step is started together with all other Worker/Consumer
  10. # bootsteps.
  11. print('{0!r} is starting'.format(parent))
  12. def stop(self, parent):
  13. # the Consumer calls stop every time the consumer is restarted
  14. # (i.e. connection is lost) and also at shutdown. The Worker
  15. # will call stop at shutdown only.
  16. print('{0!r} is stopping'.format(parent))
  17. def shutdown(self, parent):
  18. # shutdown is called by the Consumer at shutdown, it's not
  19. # called by Worker.
  20. print('{0!r} is shutting down'.format(parent))
  21. app = Celery(broker='amqp://')
  22. app.steps['worker'].add(InfoStep)
  23. app.steps['consumer'].add(InfoStep)

Starting the worker with this step installed will give us the following logs:

  1. <Worker: w@example.com (initializing)> is in init
  2. <Consumer: w@example.com (initializing)> is in init
  3. [2013-05-29 16:18:20,544: WARNING/MainProcess]
  4. <Worker: w@example.com (running)> is starting
  5. [2013-05-29 16:18:21,577: WARNING/MainProcess]
  6. <Consumer: w@example.com (running)> is starting
  7. <Consumer: w@example.com (closing)> is stopping
  8. <Worker: w@example.com (closing)> is stopping
  9. <Consumer: w@example.com (terminating)> is shutting down

The print statements will be redirected to the logging subsystem after the worker has been initialized, so the “is starting” lines are timestamped. You may notice that this does no longer happen at shutdown, this is because the stop and shutdown methods are called inside a signal handler, and it’s not safe to use logging inside such a handler. Logging with the Python logging module is not reentrant, which means that you cannot interrupt the function and call it again later. It’s important that the stop and shutdown methods you write is also reentrant.

Starting the worker with --loglevel=debug will show us more information about the boot process:

  1. [2013-05-29 16:18:20,509: DEBUG/MainProcess] | Worker: Preparing bootsteps.
  2. [2013-05-29 16:18:20,511: DEBUG/MainProcess] | Worker: Building graph...
  3. <celery.apps.worker.Worker object at 0x101ad8410> is in init
  4. [2013-05-29 16:18:20,511: DEBUG/MainProcess] | Worker: New boot order:
  5. {Hub, Queues (intra), Pool, Autoreloader, Timer, StateDB,
  6. Autoscaler, InfoStep, Beat, Consumer}
  7. [2013-05-29 16:18:20,514: DEBUG/MainProcess] | Consumer: Preparing bootsteps.
  8. [2013-05-29 16:18:20,514: DEBUG/MainProcess] | Consumer: Building graph...
  9. <celery.worker.consumer.Consumer object at 0x101c2d8d0> is in init
  10. [2013-05-29 16:18:20,515: DEBUG/MainProcess] | Consumer: New boot order:
  11. {Connection, Mingle, Events, Gossip, InfoStep, Agent,
  12. Heart, Control, Tasks, event loop}
  13. [2013-05-29 16:18:20,522: DEBUG/MainProcess] | Worker: Starting Hub
  14. [2013-05-29 16:18:20,522: DEBUG/MainProcess] ^-- substep ok
  15. [2013-05-29 16:18:20,522: DEBUG/MainProcess] | Worker: Starting Pool
  16. [2013-05-29 16:18:20,542: DEBUG/MainProcess] ^-- substep ok
  17. [2013-05-29 16:18:20,543: DEBUG/MainProcess] | Worker: Starting InfoStep
  18. [2013-05-29 16:18:20,544: WARNING/MainProcess]
  19. <celery.apps.worker.Worker object at 0x101ad8410> is starting
  20. [2013-05-29 16:18:20,544: DEBUG/MainProcess] ^-- substep ok
  21. [2013-05-29 16:18:20,544: DEBUG/MainProcess] | Worker: Starting Consumer
  22. [2013-05-29 16:18:20,544: DEBUG/MainProcess] | Consumer: Starting Connection
  23. [2013-05-29 16:18:20,559: INFO/MainProcess] Connected to amqp://guest@127.0.0.1:5672//
  24. [2013-05-29 16:18:20,560: DEBUG/MainProcess] ^-- substep ok
  25. [2013-05-29 16:18:20,560: DEBUG/MainProcess] | Consumer: Starting Mingle
  26. [2013-05-29 16:18:20,560: INFO/MainProcess] mingle: searching for neighbors
  27. [2013-05-29 16:18:21,570: INFO/MainProcess] mingle: no one here
  28. [2013-05-29 16:18:21,570: DEBUG/MainProcess] ^-- substep ok
  29. [2013-05-29 16:18:21,571: DEBUG/MainProcess] | Consumer: Starting Events
  30. [2013-05-29 16:18:21,572: DEBUG/MainProcess] ^-- substep ok
  31. [2013-05-29 16:18:21,572: DEBUG/MainProcess] | Consumer: Starting Gossip
  32. [2013-05-29 16:18:21,577: DEBUG/MainProcess] ^-- substep ok
  33. [2013-05-29 16:18:21,577: DEBUG/MainProcess] | Consumer: Starting InfoStep
  34. [2013-05-29 16:18:21,577: WARNING/MainProcess]
  35. <celery.worker.consumer.Consumer object at 0x101c2d8d0> is starting
  36. [2013-05-29 16:18:21,578: DEBUG/MainProcess] ^-- substep ok
  37. [2013-05-29 16:18:21,578: DEBUG/MainProcess] | Consumer: Starting Heart
  38. [2013-05-29 16:18:21,579: DEBUG/MainProcess] ^-- substep ok
  39. [2013-05-29 16:18:21,579: DEBUG/MainProcess] | Consumer: Starting Control
  40. [2013-05-29 16:18:21,583: DEBUG/MainProcess] ^-- substep ok
  41. [2013-05-29 16:18:21,583: DEBUG/MainProcess] | Consumer: Starting Tasks
  42. [2013-05-29 16:18:21,606: DEBUG/MainProcess] basic.qos: prefetch_count->80
  43. [2013-05-29 16:18:21,606: DEBUG/MainProcess] ^-- substep ok
  44. [2013-05-29 16:18:21,606: DEBUG/MainProcess] | Consumer: Starting event loop
  45. [2013-05-29 16:18:21,608: WARNING/MainProcess] celery@example.com ready.

Command-line programs

Adding new command-line options

You can add additional command-line options to the worker, beat and events commands by modifying the user_options attribute of the application instance.

Celery commands uses the optparse module to parse command-line arguments, and so you have to use optparse specific option instances created using optparse.make_option(). Please see the optparse documentation to read about the fields supported.

Example adding a custom option to the celery worker command:

  1. from celery import Celery
  2. from celery.bin import Option
  3. app = Celery(broker='amqp://')
  4. app.user_options['worker'].add(
  5. Option('--enable-my-option', action='store_true', default=False,
  6. help='Enable custom option.'),
  7. )

Adding new celery sub-commands

New commands can be added to the celery umbrella command by using setuptools entry-points.

Entry-points is special metadata that can be added to your packages setup.py program, and then after installation, read from the system using the pkg_resources module.

Celery recognizes celery.commands entry-points to install additional subcommands, where the value of the entry-point must point to a valid subclass of celery.bin.base.Command. Sadly there is limited documentation, but you can find inspiration from the various commands in the celery.bin package.

This is how the Flower monitoring extension adds the celery flower command, by adding an entry-point in setup.py:

  1. setup(
  2. name='flower',
  3. entry_points={
  4. 'celery.commands': [
  5. 'flower = flower.command.FlowerCommand',
  6. ],
  7. }
  8. )

The command definition is in two parts separated by the equal sign, where the first part is the name of the subcommand (flower), then the fully qualified module path to the class that implements the command (flower.command.FlowerCommand).

In the module flower/command.py, the command class is defined something like this:

  1. from celery.bin.base import Command, Option
  2. class FlowerCommand(Command):
  3. def get_options(self):
  4. return (
  5. Option('--port', default=8888, type='int',
  6. help='Webserver port',
  7. ),
  8. Option('--debug', action='store_true'),
  9. )
  10. def run(self, port=None, debug=False, **kwargs):
  11. print('Running our command')

Worker API

Hub - The workers async event loop.

supported transports:
 amqp, redis

3.0 新版功能.

The worker uses asynchronous I/O when the amqp or redis broker transports are used. The eventual goal is for all transports to use the eventloop, but that will take some time so other transports still use a threading-based solution.

hub.add(fd, callback, flags)

hub.add_reader(fd, callback, \args*)

Add callback to be called when fd is readable.

The callback will stay registered until explictly removed using hub.remove(fd), or the fd is automatically discarded because it’s no longer valid.

Note that only one callback can be registered for any given fd at a time, so calling add a second time will remove any callback that was previously registered for that fd.

A file descriptor is any file-like object that supports the fileno method, or it can be the file descriptor number (int).

hub.add_writer(fd, callback, \args*)

Add callback to be called when fd is writable. See also notes for hub.add_reader() above.

hub.remove(fd)

Remove all callbacks for fd from the loop.

Timer - Scheduling events

timer.call_after(secs, callback, args=(), kwargs=(),

priority=0)

timer.call_repeatedly(secs, callback, args=(), kwargs=(),

priority=0)

timer.call_at(eta, callback, args=(), kwargs=(),

priority=0)