A complete guide to production-ready Celery configuration

Anand Tripathi
Apr 21 · 7 min read
A complete guide to production-ready Celery configuration
A complete guide to production-ready Celery configuration

Whenever we work on some heavy data-intensive application or some long-running tasks, it generally slows down the performance of the application, and users have to wait until the task is completed. It was okay in the legacy systems where we used to wait for long to just load some single-page applications.

Long-running tasks side effects

Modern users expect pages to load instantaneously, to solve this we consider many solutions like multiprocessing, multithreading, asynchronous functions using async/await or the Message Queues.

At a high level, message queuing is pretty simple. A process, called the Producer, publishes Messages to a Queue, where they are stored until a Consumer process is ready to consume them.

Celery decreases performance load by running part of the functionality as postponed tasks either on the same server as other tasks or on a different server. These workers can then make changes in the database, update the UI via webhooks or callbacks, add items to the cache, process files, send emails, queue future tasks, and more! All while our main web server remains free to respond to user requests.

But our life is not as simple as it seems correct!! By introducing celery will not take away all the problems. As celery is an open-source library and there is a hell of a lot of configuration in it. It will work amazingly beautifully in a dev mode where there is not much load, but if we deploy the configuration in production then it is chaos.

Here is the production-ready celery configuration that will keep your production environment stable

Production-Ready Configuration

Celery worker command-line arguments can decrease the message rates substantially. Place these options after the word ‘worker’ in your command line because the order of the celery options is strictly enforced in Celery 5.0. For example,

celery -A my_celery_app worker --without-heartbeat --without-gossip --without-mingle

Source: Without these arguments, Celery will send hundreds of messages per second with different diagnostic and redundant heartbeat messages. Unfortunately, details about these settings have been removed from the current documentation, but the implementation has not changed. Read more about the Celery worker functionality in the documentation.

can also be disabled by the configuration parameter in your settings.py

worker_send_task_event=false

If you want to keep track of the tasks’ states, Celery needs to store or send the states somewhere. For this example, we use the RPC result backend, which sends states back as transient messages. The backend is specified via the backend argument to Celery, (or via the result_backend setting if you choose to use a configuration module):

app = Celery('tasks', backend='rpc://', broker='pyamqp://')

Or if you want to use Redis as the result backend, but still use RabbitMQ as the message broker (a popular combination):

app = Celery('tasks', backend='redis://localhost', broker='pyamqp://')

For every task celery internally store the task state in celery_taskmeta table. So before starting it processes the table data and after processing it updates the data in the table that decreases the performance.

Disable the result backend and ignore the task result, that will improve the performance of the application and you can create your own custom logic for the result backend.

By default, the result backend is disabled in celery, and to disable the result use the below configuration

task_ignore_result = True

in a configuration file or you can also disable per task

@app.task(ignore_result=True)
def task_with_no_result():
# ...code without return..

Generally, celery has no time limit for the tasks. A task can run an indefinite amount of time. That can cause your message queue to be unresponsive. So it is a good practice

Hard Limit

Global variable in config.py

task_time_limit=60  # task will be killed after 60 seconds

Task-specific settings

@app.task(time_limit=60)
def task_with_time_limit():
pass

Task hard time limit in seconds. The worker processing the task will be killed and replaced with a new one when this is exceeded.

Soft Limit

Global variable in config.py

task_soft_time_limit=60  
# task will raise exception SoftTimeLimitExceeded after 60 seconds

Task-specific settings

@app.task(soft_time_limit=50, time_limit=60)
def task_with_time_limit():
pass

The task can catch this to clean up before the hard time limit comes:

from celery.exceptions import SoftTimeLimitExceeded@app.task(soft_time_limit=50, time_limit=60)
def mytask():
try:
return do_work()
except SoftTimeLimitExceeded:
cleanup_in_a_hurry()

By default Celery first marks the task as ran and then executes it, this prevents a task from running twice in case of an unexpected shutdown. This is a sane default because we cannot guarantee that every task that every developer writes can be safely run twice. But if you proactively write Idempotent and atomic tasks turning on the task_acks_late setting will not harm your application and will instead make it more robust. If not all tasks can be configured in that way, you can set acks_late per task:

Global variable in Config.py

task_acks_late = True. # task messages will be acknowledged after the task has been executed, not just before (the default behavior).

Task-specific configurations

@app.task(acks_late=True)
def task_with_acks_late():
pass

By default, preforking Celery workers distribute tasks to their worker processes as soon as they are received, regardless of whether the process is currently busy with other tasks.

If you have 20 tasks and each takes 1 second to finish. You set up 4 workers to run through these 20 tasks:

celery worker -A ... -Q random-tasks --concurrency=4

This will take about 5 seconds to finish. 4 subprocesses, 5 tasks each.

But, if instead of 1 second, the first task (task 1 of 20) takes 10 seconds to complete, the total amount of time this queue will take to execute? It’s not 10 seconds — it’s 14 seconds.

That’s because the tasks get distributed evenly, so each subprocess gets 5 of the 20 tasks.

https://medium.com/@taylorhughes/three-quick-tips-from-two-years-with-celery-c05ff9d7f9eb
  • Ofair disables this behaviour and delegates the task to the worker as soon as they are available instead of preforking them with the tasks.
celery worker -A ... -Ofair -Q random-tasks --concurrency=4

Default: 4.

How many messages to prefetch at a time multiplied by the number of concurrent processes. The default is 4 (four messages for each process). The default setting is usually a good choice, however — if you have very long running tasks waiting in the queue and you have to start the workers, note that the first worker to start will receive four times the number of messages initially. Thus the tasks may not be fairly distributed to the workers.

To disable prefetching, set worker_prefetch_multiplier to 1. Changing that setting to 0 will allow the worker to keep consuming as many messages as it wants.

For tasks that are doing some network operation, it would be best to mark the prefetch limit as 1. And if the task messages are small and didn't interact with the network then it is best to increase the limit to a somewhat big number — 10

Global configuration in config.py

worker_prefetch_multiplier = 10  
# One worker taks 10 tasks from queue at a time and will increase the performance

This is how many process/threads/green-threads that should process tasks at the same time.

If your workload is CPU bound then limit it to the number of cores you got (this is the default), more will only slightly decrease the performance.

celery worker -A ... -Q random-tasks --concurrency=4

But if you’re doing I/O, i.e. doing outgoing HTTP requests, talking to a database or any other external service, then you can increase it a lot, and gain a lot of performance, 200–500 is not unheard of.

Source: The prefork pool can take use of multiple processes, but how many is often limited to a few processes per CPU. With Eventlet you can efficiently spawn hundreds, or thousands of green threads. In an informal test with a feed hub system the Eventlet pool could fetch and process hundreds of feeds every second, while the prefork pool spent 14 seconds processing 100 feeds. Note that this is one of the applications async I/O is especially good at (asynchronous HTTP requests). You may want a mix of both Eventlet and prefork workers, and route tasks according to compatibility or what works best.

celery -A proj worker -P eventlet -c 1000

Summarizing everything in the code

celery_config.py

celery_app.py

Command to run

CPU Bound task

celery -A <task> worker -l info -n <name of task> -c 4 -Ofair -Q <queue name> — without-gossip — without-mingle — without-heartbeat

I/O task

celery -A <task> worker -l info -n <name of task> -Ofair -Q <queue name> -P eventlet -c 1000 — without-gossip — without-mingle — without-heartbeat

Conclusion

So this was my curated list of production-ready configurations for celery workers. I hope, it has helped you or it will help you. If you want to discuss anything or anything related to tech, you can contact me here or on my personal blog Progress Story.

KOKO Networks

Engineering under the hood