Monitoring with OpenTelemetry and Elastic APM in Kubernetes

idinak
5 min readSep 2, 2024

--

Utilizing OpenTelemetry and Elastic APM in Kubernetes to monitor your distributed applications provides a robust solution for detecting performance issues and debugging. In this guide, we’ll walk through the step-by-step process of deploying OpenTelemetry using the OpenTelemetry Operator and setting up Elasticsearch and Kibana for Elastic APM.

Installing and Configuring OpenTelemetry Operator

To install the operator in an existing cluster you have to install cert manager before:

  • Install all cert-manager components:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.3/cert-manager.yaml
  • Install opentelemetry operator:
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Once the ‘opentelemetry-operator’ deployment is ready, create a Custom Resource (CR) to configure OpenTelemetry agents and collectors. Here’s an example YAML file:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: example-collector
spec:
config: |
receivers:
otlp:
protocols:
grpc:
http:
processors:
memory_limiter:
check_interval: 1s
limit_percentage: 75
spike_limit_percentage: 15
batch:
send_batch_size: 10000
timeout: 10s

exporters:
otlp/elastic:
# Elastic APM server https endpoint without the "https://" prefix
endpoint: "apm-server-apm-http:8200"
tls:
insecure: true
headers:
# Elastic APM Server secret token
Authorization: "############"

service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp/elastic]
metrics:
receivers: [otlp]
exporters: [otlp/elastic]
logs:
receivers: [otlp]
exporters: [otlp/elastic]

Explanation of the key elements used in this YAML file:

  • Receivers: Defines the functions for data collection. For example, the otlp receiver is used to accept data over the OpenTelemetry Protocol (OTLP), supporting both gRPC and HTTP.
  • Processors: Defines functions that process the received data. For instance, the memory_limiter processor is used to control memory usage. Explanation of the variables in the memory_limiter processor:
  • check_interval: This variable determines the frequency at which the memory limiter checks the memory usage. In the example, it is set to 1s, meaning the check is performed every 1 second.
  • limit_percentage: This variable sets the baseline memory limit as a percentage of the total available memory. In the example, it is set to 75, meaning the processor will start limiting memory usage when it reaches 75% of the available memory.
  • spike_limit_percentage: This variable defines a spike memory limit as a percentage. If the memory usage exceeds this spike limit, the processor takes more aggressive actions. In the example, it is set to 15, indicating that if memory usage spikes above 15% of the available memory, the processor will take more stringent measures.
  • Exporters: Defines functions that send the collected data to a destination. In the example, the debug exporter prints the collected data in a format suitable for debugging.
  • Service: Contains general configurations related to the OpenTelemetry Collector service. In this section, a pipeline configuration for trace data is specified. This pipeline can include a sequence of receivers, processors, and exporters. In the provided example, a pipeline for trace data is configured with only the otlp receiver and debug exporter.

The above “example” results in the deployment of the Collector, which you can use as an endpoint for automated instrumentation in your pods.

OpenTelemetry Auto-Instrumentation

Now, let’s instrument a sample application with OpenTelemetry in Kubernetes.

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: example-instrumentation
spec:
exporter:
endpoint: http://elastic-apm-collector:4317
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_traceidratio
argument: "1"
java:
env:
- name: OTEL_RESOURCE_ATTRIBUTES
value: deployment.environment=test,namespace=example-test
- name: OTEL_METRICS_EXPORTER
value: otlp
- name: OTEL_TRACES_SAMPLER
value: traceidratio
- name: OTEL_TRACES_SAMPLER_ARG
value: '0.1'

OpenTelemetry propagators are components of OpenTelemetry that form part of tracing data. Propagators define the rules for how a trace context is transmitted and received. The trace context contains information carried as part of a tracing operation, and this information must be preserved as it moves from one point in the trace to another.

OpenTelemetry supports several different types of propagators. These propagator types include:

  1. TraceContextPropagator: TraceContext is the fundamental trace context in OpenTelemetry. This propagator encodes and decodes the trace context according to the TraceContext standard. It can transition the trace context compliantly across transport mechanisms such as HTTP headers.
  2. BaggagePropagator: Baggage contains user-defined data that can be carried as part of the trace context. This propagator is used to transport data within the trace context.
  3. B3Propagator: B3 is another standard for carrying trace context. This propagator encodes and decodes the trace context according to the B3 standard.
  4. W3CPropagator: W3CPropagator implements the Trace Context standard defined by the World Wide Web Consortium (W3C). The goal of this standard is to ensure compatible trace context transitions between different tracing systems.

Propagators can be used in the application libraries of OpenTelemetry in various languages and can be selected through configuration. This allows for standardized trace context transitions between different tracing systems, making the usage more flexible. Default is tracecontext,baggage (W3C).

We can also define it in the environment varible, just like the config above. Link for details.

The path we export here is the elastic apm server(http://elastic-apm-collector:4317) that we will install below.

Installing and Configuring Elastic Cloud on Kubernetes Operator

To install the eck crd in an existing cluster:

kubectl create -f https://download.elastic.co/downloads/eck/2.10.0/crds.yaml

Install the operator with its RBAC rules:

kubectl apply -f https://download.elastic.co/downloads/eck/2.10.0/operator.yaml

Install the Elasticsearch:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: 8.11.3
nodeSets:
- name: elasticsearch-apm
count: 1
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
runAsUser: 0
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms4g -Xmx4g
resources:
requests:
memory: 8Gi
limits:
memory: 8Gi
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: gp2
http:
tls:
selfSignedCertificate:
disabled: true

Install the Kibana with Ingress:

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana-elasticsearch-apm
spec:
version: 8.11.3
count: 1
elasticsearchRef:
name: elasticsearch
http:
tls:
selfSignedCertificate:
disabled: true

---

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kibana
annotations:
kubernetes.io/ingress.class: nginx-internal
spec:
rules:
- host: apm.kibana.com
http:
paths:
- backend:
service:
name: kibana-elasticsearch-apm-kb-http
port:
number: 5601
path: /
pathType: Prefix

Install the Apm Server:

apiVersion: apm.k8s.elastic.co/v1
kind: ApmServer
metadata:
name: apm-server
spec:
version: 8.11.3
count: 1
elasticsearchRef:
name: elasticsearch
http:
tls:
selfSignedCertificate:
disabled: true

After standing up the structure in this way, if we need to draw the structure as a diagram:

At the last stage, we can access the metrics of our relevant services and measure their performance.

--

--