Distributed Task Queue with Celery and Monitoring with Prometheus Metrics

Published in

Insider Engineering

5 min readNov 21, 2022

In this blog post, we will talk about how to use Celery and integrate it with Prometheus and Grafana. We will dive into the code and implement a sample Celery application to understand the internals in detail. We will also talk about why and how we use Celery at Insider. The article also aims to answer questions regarding monitoring Celery clusters with Prometheus and Grafana.

What is Celery?

Celery is a task queue implementation tool for Python web applications, designed to execute work asynchronously outside the HTTP request-response cycle.

The architecture of distributed task queue with Celery

In the figure above, you can see how Celery works. There are five parts that we can discuss. The beat scheduler schedules regular tasks at specific time intervals. For example, if you want to check the weather every 5 minutes, you can set a scheduler to do that. On the application side, tasks can be scheduled programmatically with an API, automation, etc. Both the beat scheduler and the application send the tasks to the broker. Then workers take the tasks in order and start processing them. Each worker that finishes their task pulls the next task from the broker and this process continues until the tasks are finished. The finished tasks can return values and the workers write the values into the cache. In the end, the application can read and use them.

Implementation of Sample Celery Application

Let’s write a simple application that checks and logs the CPU usage every 5 minutes.

# Folder Structure
main
├─ app.py 
└─ tasks.py

We need to decide which broker to use. In this example, I decided to use Redis. First, we run a Redis container with the following command.

docker run -d -p 6379:6379 redis

We will have two tasks. The former will be checking the CPU usage, while the latter is logging it.

We define the configurations of the Celery app and the scheduler in the app.py file.

Then, we need to write task functions in the tasks.py file.

We need to call the task read_cpu_usage every 5 minutes. For this purpose, we start a beat scheduler and some workers. The following command starts a celery worker. It will not execute any task. It will wait for the scheduler to send a task.

celery -A tasks worker — loglevel=INFO

To start the beat scheduler, run this command.

celery -A app beat — loglevel=INFO

After that, the scheduler sends the read_cpu_usage task every 5 minutes and the worker receives and executes it.

As you can see in the console logs, the scheduler sends the read_cpu_usage task to the broker every 5 minutes. Then the workers receive the tasks and execute them.

Monitoring Celery with Prometheus & Grafana

Prometheus is used for event monitoring and alerting. It stores all real-time metrics, which can be anything you need. For monitoring and administrating the Celery clusters, we can use the Flower tool. It can easily run using the Docker command below.

docker run -p 5555:5555 mher/flower

Flower stores its metrics in the /metrics endpoint. When it is invoked from the local environment, stored metrics will be in the http://127.0.0.1/metrics endpoint. Prometheus. Then it retrieves these metrics from the endpoint and stores them in a time series database. Thus, we can use these metrics to set alarms and perform monitoring, etc.

To enable Prometheus to receive metrics from the Flower endpoint, we need to add the Scrape configurations to Prometheus’ configuration file, Prometheus.yaml.

global:
  scrape_interval:
     15s
  evaluation_interval: 15sscrape_configs:
  - job_name: celery-flower
  static_configs:    
    - targets: ['localhost:5555']

Prometheus scrapers scrape metrics from the /metrics endpoint by default. If you set your metrics to a different endpoint in your application, you should define it in prometheus.yaml.

After all customizations are completed, Prometheus will automatically start scraping the Flower metrics.

Grafana is an open-source and cross-platform web application for analysis and interactive visualization. Grafana allows us to monitor our applications, environments, or databases with visualization options such as charts, graphs, and alerts.

We can implement our Prometheus database in Grafana data sources and use any metric by using Prometheus Query Language (PromQL).

The Flower provides us with the official Grafana dashboard template. It can be easily downloaded and imported into Grafana.

Official Celery-Grafana Dashboard Template

The template contains several charts, e.g. about work status, success and failure ratios, etc. You can define some alerts, and monitor the Celery clusters.

Why Do We Use Celery in Insider?

At Insider we use a variety of methods to store our partners’ catalogs. Storing the catalog information of partners allows us to integrate them with our products easily. One of the methods is XML Catalog Integration. In XML Catalog Integration, partners add their XML file URLs and mappings, and our XML parser runs periodically and processes the XML resources at specified intervals. To ensure that we process XML files of partners within specified intervals. We utilize Celery for managing tasks with logs of the XML parser. To this end, we set up a beat scheduler that checks the last execution time of the XML resources every minute. When the runtime for some resources has expired, the beat scheduler triggers the run_xml_resource_by_id task to synchronize the partners’ XML files with their product catalog.

Conclusion

We looked at running asynchronous tasks outside the HTTP request-response cycle with Celery and monitoring Celery clusters with Prometheus and Grafana. In this article, we did a simple example. However, with Celery and Prometheus, more complex applications can be made such as an e-mail service, long-running tasks, etc. If you have a process that cannot be done on average HTTP response time, you better use Celery.

I hope you enjoyed this article. If you have any questions, please feel free to contact me on LinkedIn or comment below. Stay tuned for more articles on Insider Engineering Blog.