How to avoid data loss of Prometheus metrics running on ECS Fargate during restarts?

Lalit Prasanth
Fasal Engineering
Published in
3 min readOct 3, 2022

Introduction

Prometheus is an open-source monitoring system and time-series database, which is used to monitor metrics from a source and then aggregate them so they can be queried by other tools, such as Grafana. We recently developed our monitoring stack for our ECS clusters on AWS Fargate applications using Prometheus and Grafana. However, a significant challenge for us was that whenever the task restarts for updates the previous metrics collected by Prometheus used to get removed, and every time the Prometheus task was used to recreate with a new task definition.

Unlike running a Prometheus container on a VM, you can bind the volume mount to the VM and persist the data. But when running it on ECS Faragte you cannot bind the volume. You might think of using AWS EFS, but it does not support the Prometheus time series database format, which is also on the Prometheus website.

Then what are the other options available to make Prometheus metrics data persistent??

Prometheus provides an option for remote storage and includes a list of supported databases on its website. Out of these InfluxDB is most preferred since it supports both reads and writes. Below we will be exploring how to set up remote storage using InfluxDB and avoid data loss in Prometheus.

Installing InfluxDB:

Here we will be installing InfluxDB on an EC2 VM running on Linux

Create yum repo file for influxDB

cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/7/$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF

Configure InfluxDB

Open /etc/influxdb/influxdb.conf

Search for below mentioned fields under the HTTP block, uncomment them and update them accordingly

enabled = truebind-address = "0.0.0.0:8086"auth-enabled = false - (set to true if you wish to have authentication enabled to your influxdb)

Start the InfluxDB

systemctl start influxdb
systemctl enable influxdb

Connect to InfluxDB and create database for prometheus

influx - (to connect to InfluxDB)CREATE DATABASE prometheus;
USE prometheus;

Configure Prometheus to connect to InfluxDB

  • Open the prometheus.yml file and add the below lines
remote_write:
- url: "http://<influxdb_ip>:8086/api/v1/prom/write?db=prometheus"
remote_read:
- url: "http://<influxdb_ip>:8086/api/v1/prom/read?db=prometheus"
  • If you enabled auth in the influxdb configuration then add the below lines in prometheus.yml
remote_write:
- url: "http://localhost:8086/api/v1/prom/write?db=prometheus&u=username&p=password"
remote_read:
- url: "http://localhost:8086/api/v1/prom/read?db=prometheus&u=username&p=password"
  • Save and exit the prometheus.yml file and redeploy your Prometheus ECS task using the updated config file in your task definition.

Once the Prometheus task starts, you can go back to the EC2 instance where you installed InfluxDB and connect to the database to check if data is being written into the DB.

influxUSE prometheus;
select * from /.*/ limit 1;

Try to redeploy your Prometheus task and see if data is being persistent on the Grafana dashboard.

Good Luck :)

--

--