Prometheus vs StatsD for metrics collection

Yuvaraj Loganathan
3 min readAug 4, 2018

--

Metrics your health indicator

StastD was well know for metrics collection for almost an decade now. Initially popularised by Flickr in the days of flat metrics (Graphite) etc. One important evolution happened in the metrics collection and analyse space is metrics tagging. I would say metrics tagging revolutionised whole metrics space with better structure , filtering and group by. Today to use tags in StatsD one popular option would be using dogstatsd protocol(Evolution of plain statsd protocol with tags support).

Prometheus is very new to the metrics market but growing very fast and with the cncf.io boosters attached to it. Considering the idea for prometheus born from google from day one prometheus supported tags.

StatsD/Dogstatsd is just an protocol where prometheus umbrella project includes protocol, collection & time series database. Prometheus also has an alert manager. An prometheus equivalent example setup would include StatsD Server (Telegraf StatsD), time series database(InfluxDB) and Kapacitor for alerting.

Pros of StatsD:

  • Time to kickstart is very low considering all you need to know is statsd server ip , port and client library of you language.
  • Percentiles are calculated on the server side that will help us to collect aggregated views of multiple instances of same service.
  • Percentiles and histograms are calculated on server side relatively less overhead in the client application.
  • Transmits all the data over UDP protocol which avoids any network connectivity issues bringing down the whole application.
  • Short lived process can easily send metrics since StatsD is push based systems.
  • Relatively uses less memory because the metrics are pushed to server as it comes.
  • Implementation effort from developer side is less.

Cons of StatsD:

  • If there is an overload on StatsD server side then we may loose the metrics considering the metrics are transported over UDP.
  • The volume of statsD traffic will increase with the volume of your instrumentation (We can use sampling to reduce this).
  • If your StatsD traffic volume cannot fit into an single server then percentile calculation may not accurate considering that now percentiles are calculated across multiple servers.
  • Need to stitch together multiple tools(storage,alerting) to make it work.

Pros of prometheus:

  • Long term viability of the project as it comes under cncf foundation which comes under linux foundation.
  • Metric tags are supported first class.
  • Each metrics comes with an description which will help in understanding the metrics.
  • You can view per server level metrics in the local endpoints usually *http://localhost:8888/metrics*(Google calls it zPages) of the application.
  • If the prometheus server is down we won’t loose app metrics data because all the data is stored in the application memory.
  • Metrics protocol, storage , collection & alerting all comes out of the box.
  • Metric collection traffic does not increase proportionally with your application traffic growth as data is scrapped at fixed interval usually 10s.
  • Application availably can be figured out easily by deducing from the fact that if we are not able to scrap the metrics then application is not available.

Cons of prometheus:

  • Since prometheus is pull based metric collection system it demands service discovery(consul or etcd) to discover all the application prometheus endpoints.
  • Percentiles computation may not be give overall picture as percentiles are calculated at each instance level not at global level.
  • More memory is used as the metrics are locally stored in application memory.
  • Short lived process needs push gateway to push metrics.
  • Implementation and developer expertise required are high.

Grafana is the defacto metrics visualisation tool in both the cases :)

Even though I am tempted to write my opinion here I am leaving the conclusion to you :)

For more details on pull vs push you can read.

Let me know your thoughts in the comments section. You should follow me on twitter @SkyRocknRoll

--

--

Yuvaraj Loganathan

Passionate about building scalable, antifragile and highly available systems with simplicity. Architect, Father & Husband