Lesson learned: the sampling impact on monitoring

Pier
Geek Culture
Published in
3 min readNov 15, 2022

Today I would like to present how you may be impacted by metrics that you use in your day-by-day monitoring graphs.

If you like the story remember to clap and subscribe, I really appreciate that!

Used technologies in this article:

  • Kubernetes
  • New Relic (Similar to Prometheus or Datadog)

My goal is to find the right amount of replicas/resources(memory&cpu) required to satisfy a specific workload. I won't discuss how to tune a pod or the HPA/VPA, because there is no magic receipt to fine-tune your workloads.

Now we focus on three metrics that highlight when you are affected by sampling:

  • Input traffic — constant in my tests
  • Number of replicas
  • Number of transactions

As you can see from this graph there are four phases:

  • 20 replicas with 120k transactions
  • 10 replicas with 60k transactions
  • 5 replicas with 30k transactions
  • cat ears replicas 🐈

This behavior is completely no-sense because the input traffic was mainly constant when present.

So looking closer at the plotted metric:

Select rate(count(*), 1 second) from Transaction where containerName = 'pod-under-test' FACET containerName  TIMESERIES MAX LIMIT MAX

After a few minutes across the documentation I finally found what is going on:

https://docs.newrelic.com/docs/data-apis/understand-data/event-data/new-relic-event-limits-sampling/#impact

Event that are capped and subject to sampling include:
- Transaction

🎉 🎉 🎉 🎉 🎉
Given the high traffic, every pod is sending a maximum number of samples that is configured by default in the New Relic APM agent.

So, this behavior is due to unexpected sampling and a graph with unexpected meaning.

Lesson learned:

  • Validate always the metrics that you are using.
  • Don’t plot sampled metrics to avoid wrong conclusions.

I hope you may get some help from this blog post and to support me you can clap & subscribe to get more in the upcoming weeks!

Follow Me and Subscribe to get the updates on this and the next series!

Photo by River Fx on Unsplash

--

--

Pier
Geek Culture

DevOps Engineer @Microsoft | Working with Python, C++, Node.js, Kubernetes, Terraform, Docker and more