The docker age: Monitoring showdown Prom vs. TICK vs. Sensu

amit bezalel
6 min readJan 29, 2017

--

After gathering information about the most popular products in the market, I wanted to get a feel of my 3 finalists: Prometheus, Sensu & TICK stack, so i got to the testing..

Fortunately I already had a Kubernetes system with about 400 pods running on 16 different namespaces to facilitate our development process, this environment also includes several Postgres databases (on AWS RDS), a redis and a RabbitMQ installation as well as an ELK stack for logs. Half an hour of searching and some “docker run” magic later, I had a running instance of each of the products via some open source docker-compose scripts.

Diving into the products themselves took more time, since each of them is different in architecture and in prospective.

Prometheus

prometheus architecture uses the pull method to collect information, this means that it approaches open ports on it’s agents instead of the agents connecting to it. This approach is not common and it does have it’s pros and cons, but we’ll leave that for later. Here is a chart of the prometheus architecture in action (monitoring system parts in purple):

As you can see, agents are called Exporters and can run as a second container inside pods, on the same machine (in a Kubernetes daemonSet), or on any machine if the service they monitor can be probed for information remotely (like Postgres or rabbitMQ), each exporter is specific to a certain purpose monitoring a specific infra component or a machine’s hardware. Prometheus can also use TICK’s Telegraf as an exporter as is explained in the part about TICK.

Prometheus is easy to install, using one server as the central monitoring system and storage, it has a built-in DB created just for saving monitoring data and is still considered faster than influxDB for this purpose. Blogs about it say it has the ability to monitor 40,000 containers on a M4.Xlarge machine. My personal experience saw it take 100,000 time series on 400 containers plus some postgresDBs and rabbits on a M4.large machine without breaking a sweat (just make sure you have enough disk IOPS to save the data in time).

For high availability the instruction is to use two different servers, both monitoring the same exporters. Here is where the pull method comes in handy, since the exporters don’t know of the server’s address, any number of servers can connect to them and pull the data. The alerting component has the ability to de-duplicate alert when connected to two servers.

The alerting component and the DB have a nice expression based language where you can pick values, and attach curly brackets to filter by labels, it covered all the bases which I checked while creating monitoring for our performance needs and was easy to use and well documented in the prometheus site.

Prometheus expressions (pod memory)

Prometheus is also 100% free which makes it the easiest & cheapest HA monitoring solution in the market.

TICK Stack

This project is a very young one, only about one year old (first commit in github was on aug 2015). It is, however, backed by the world’s most popular TimeSeries database: InfluxDB. When diving in, you can see this is a very energetic group, which has covered a lot of ground in a very short time and is presenting a full solution which is rather impressive.

TICK uses a more traditional push methods where agents connect to a central monitoring system, here the agents are not specific to the infra component, instead, the agent which is called “Telegraf” is just one piece of software which is pluggable, and has many different input and output plugins for specific infrastructure monitoring. Its plugins also allow it to connect to different communication methods like statsD, Nagios Plugins and a two way integration with prometheus: gaining data from exporters as well as posing as one in order to send data to a prometheus installation. The last means both TICK and prometheus practically share the two collector methods (Exporters and Telegraf).

Similarly to prometheus, TICK is free, easy to install, based on one server/DB combo as the monitoring engine and since it is backed by a company, it also provides Enterprise support and database clustering to those who don’t care to shell out some cache for a complete solution. (which can be counted as a pro or con, depending on your view)

Both the alerting component and the DB server use an sql-ish query language that covers all your needs. Unlike prometheus’ query language, Influx queries are modeled in Grafana in an “pick the values from a combo” way (it doesn’t cover all the abilities, but most basic queries can be done without knowing the select syntax)

modeled combobox pickers (pod memory)
SQL like InfluxDB syntax (pod memory)

TICK is both easy to deploy and based on an official DB, it is free an OpenSource. If you don’t buy the enterprise edition, high availability is a challenge, but as the saying goes, you can’t look a gift horse in the mouth.

Sensu

Sensu is the veteran in this group of younglings, it has been around from 2011, and has a noticeable market share of around 7%, its architecture is geared towards massive amounts of data, so it uses RabbitMQ to pipe and buffer the monitoring information between it’s collectors and it’s main server.

Sensu has its own collection plugins and supports Nagios plugins as well. In my experience its Kubernetes plugin needs some work, currently only polling the services to see if they are alive and not fetching any actual rich statistics like cpu/memory per pod and all the other good stuff needed for performance monitoring.

The Sensu servers are built for High Availability out of the box, but they don’t include a DB for storing the data, it uses Redis by default but for storing more than several hours, you will need to include either InfluxDB or ElasticSearch in your installation, both of which will require an enterprise license if you want enterprise features (InfluxDB charges for clustering, ElasticSearch charges for security)

In Sensu alerts are called checks, and creating one involves creating a cron job and configuring a json file which tells it what ruby script to run, what statistic to check (including some calculations) and whom to notify. It doesn’t seem as straight forward as the other solutions.

For DB Queries and Grafana, it really depends on your choice of TimeSeries DB, since Sensu supports several but comes with no recommendation out of the box. (Influx in grafana will look the same as the images above)

Sensu is also backed by a company, which offers enterprise support for the product, although it seems to me that the other products are both easier to install, maintain and give a better value if you are not going for anything short of epic sized monitoring.

To summarize it visibly I’m adding a comparison table:

For a free solution i would definitely go with prometheus, if DB clustering is a specific requirement TICK is your solution, both of them are easy to install with a helm script and some tinkering. for creating nice dashboards, there is no dilemma Grafana is the absolute best and while TICK still develops it’s own UI (Chronograf), all of them integrate with it.

--

--

amit bezalel

After 17 years developing software, Coding is still a passion of mine, a hands-on SaaS cloud architect working in TS and GO & learning something new everyday.