Deploying OpenTelemetry in EKS and install Otel collector in sidecar mode

Chaitanya Solasa
9 min readDec 1, 2022

--

Here we go to my second article , OpenTelemetry Installation in EKS. Before going deep down to cover what I did , lets talk a little more about OpenTelemetry and why this buzzword is catching up these days.

The bookish definition says

“OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.”

But hell with these definitions lets understand in a simpler manner

Being an engineer you would have worked with different observability tools like prometheus/grafana, Cloudwatch, AzureAppInsights, ELK, EFK what not but do you see one thing they are quite different from one another and integrating your code with them is different for each of them because each them have their own SDK’s formats etc. Now just imagine you felt pretty pissed off with one of these lovable tools that you are working daily with and thought making it your ex :P , simply put you cant do so easily because it comes with lot of effort as your code is tightly coupled with their APIS/SDKS and its not going to be a simple decision that can be taken by Ops team because devs need to do lot of rework with their code, It involves collective effort from the all the teams ..But let me tell you about an imaginary situation where with less efforts you can breakup with the tool and patchup with a new one with no big fuss, and that imaginary situation is pulled off to reality by OpenTelemetry. OpenTelemetry is a standard that is created by CNCF in writing your observability(Logs,Metrics,Traces) using the Otel SDK so that you can Integrate it with the tool of your choice with very very less effort , Lets say tomorrow you want to move from Cloudwatch to Azure AppInsights as your app needs to be cloud agnostic you can do that in couple of mins(Just exaggerating may be few hours) with pretty less code change. Looks cool right, to lift and shift your Logging/Monitoring mechanism in no time. So this flexibility is what driving the Otel craze. But remember that this may not be available for every observability platform because your platform needs to participate/support Otel , right now atleast we can tell major platforms support or trying to start supporting it , One drawback is its pretty new tech and for many platforms you may see they are still in their baby steps.

So in this article lets walk around how we brought in Otel in our EKS , how we integrated with Cloudwatch(you can do with whatever of your choice though)

If you are just curious what our Env details are:

k8s — EKS

Cloud- AWS

Observability- AWS Cloudwatch

Lets start the show then. In this story let me throw a light into the movie using the teaser, So what we are basically trying to achieve is make our microservices which are running on EKS to send MT(metrics,traces) to cloudwatch . Before you things what about logs, AWS/CNCF is still working on it I See it already in their contrib github as an experimental/optional but waiting for a stable version to move towards it .

So what are the steps! Before going there here is a simple OTEL implementation

Architecture Implementation

So we are going to implement in the following way

  1. Deploy your microservice as a pod with a OpenTelemtery sidecar attached and your API SDK send these logs on a certain port.
  2. This sidecar feeds on an OpenTelemetryCollector configuration
  3. The OpenTelemetryCollector configuration have details to which tool we are going export those MLT(Metrics,Logs,Traces) too.
  4. These sidecars send these collected MLT to the provider(cloudwatch/jaeger/ELK) of your choice in our case Cloudwatch. These sidecars are otel-collectors which (receive,process,export) these Telemetry data, for transmitting this telemetry data to collector from SDK it uses OTLP(opentelemetryprotocol)

So lets start with implementation steps

Step1: Find one of your EKS node in the cluster you want to install Otel and check the IAM role(NodeInstanceRole) attached to it and make sure you are adding the following IAM Roles to it .

arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

And remember the Step1 can be completely eliminated and instead you can go the serviceaccount way instead of granting permissions on a node level like mentioned in this article .

Step2: For Otel to get implemented in our cluster we need to have adot addon installed in our EKS cluster for that we follow this otel-article from aws which helps you to install certificate -manager and permissions to install adot addon in your EKS cluster.

Step3: Now once the above steps are done we are good to go with installing Otel collector k8s operator using helm way on your EKS cluster

you can execute the following commands and install it .

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

helm repo update

helm install otel-operator open-telemetry/opentelemetry-operator


#To check whether installation is successful execute the below command and that should show u all resources

kubectl get all -n opentelemetry-operator-system

Now we see we have installed the Otel operator on EKS

Step4: Now we need to install adot addon in our cluster and it can be done in multiple ways but we choose to install it from console , where we follow these steps.

Install the ADOT Amazon EKS add-on to your Amazon EKS cluster using the following steps:

  1. Open the Amazon EKS console at https://console.aws.amazon.com/eks/home#/clusters.
  2. In the left pane, select Clusters, and then select the name of your cluster on the Clusters page.
  3. Choose the Add-ons tab.
  4. Click Add new and select AWS Distro for OpenTelemetry from the drop-down list.
  5. The default version will be selected in the Version drop-down. Click Override existing configuration for this add-on on the cluster if a service account is already created in the cluster without an IAM Role.
  6. Click Add.

Now after these steps in your console u should see your addon is Active status.

Step5: Now comes the important step we have deployed Otel with all requirements but now we are gonna have a plethora of choices now ,

We can collect Telemetry data using the Otel collector agent in multiple ways as told in this document like Daemonset, Deployment, Statefulset, Sidecar Mode ,

If you ask me I chose Sidecar Mode because I can customize MLT for each of my api according to my choice I can enjoy segregated log-group for each api , customize my sidecar to send logs to tool of my choice with whatever layer of customization I want, which is something you may not get in other modes as most of them are either a namespace level or cluster level setting and if u want to customize for one api you cant do that easily , but yeah your pod gets a little bulky but at the end I choose it because it suits my case and you are always open to do it your way .

I approached it in the following way(otelcollector.yaml)

otelcollector.yaml
---

$ kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: sidecar-for-my-app
spec:
image: public.ecr.aws/aws-observability/aws-otel-collector:latest
mode: sidecar
config: |
extensions:
health_check:
pprof:
endpoint: 0.0.0.0:zzzz

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:xxxx #(grpc endpoint on which my pod/api sends logs, so our sidecar listens to this port and collects from here)
http:
endpoint: 0.0.0.0:yyyy #(http endpoint on which my pod/api sends logs, so our sidecar listens to this port and collects from here)

processors:
batch:

exporters:
logging:
loglevel: debug
awsxray:
region: <region>
awsemf:
log_group_name: <log-group-name>
log_stream_name: <log-stream-name>
namespace: <namespace>
region: <region>

service:
pipelines:
traces:
receivers: [oltp]
exporters: [awsxray]
metrics:
receivers: [otlp]
exporters: [awsemf]

extensions: [pprof]
telemetry:
logs:
level: debug
EOF

$ kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: myapp
annotations:
sidecar.opentelemetry.io/inject: "true"
spec:
containers:
- name: myapp
image: someimage:latest
ports:
- containerPort: 8080
protocol: TCP
EOF

Now the thing is we know how to install it but how are we gonna attach these sidecars to our pods right because we understood that each sidecar can be of different configuration and how does our main API pod gets attached with the relevant sidecar?

Let me make it simple, in your pod annotations just add the following line

sidecar.opentelemetry.io/inject: “name of the Otel collector u created using above config”. In our case like below

annotations:
sidecar.opentelemetry.io/inject: "sidecar-for-my-app"

Now the moment you install pod along with the otelcollector.yaml with right values , you should be able to see that MLT getting collected by Otel sidecar and sent to cloudwatch.

But then I really felt like let me send my other api logs to jaeger/prometheus/elk or any tool how would I do it because this whole article we are bragging about flexibility it gives wrt lift and shift right?

Hmm.. its simple , ask your dev to change the observability config to the right tool from the code perspective and then in the above yaml you just change the configurations for exporters, like for example I want to change my logging to jaeger I can do like below

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: sidecar-for-my-app
spec:
image: public.ecr.aws/aws-observability/aws-otel-collector:latest
mode: sidecar
config: |
receivers:
jaeger:
protocols:
grpc:
processors:

exporters:
logging:

service:
pipelines:
traces:
receivers: [jaeger]
processors: []
exporters: [logging]

Also if you ask me about Logs , we do it using fluent-bit right now but as I told there is an experimental way of getting done that for logging in their contrib repo, and you need to do installation for otelcollector accordingly and I will wait for it to get stabilized before putting it. It doesn’t take much effort as I told just change in config.

Here is the next part of leveraging contrib

Don’t skip the references part it has a collection of good articles that are referred to achieve this as there isn’t one place you can get all info

References:

this one is a good article which sheds some good info

To understand how to configure your otel or OLTP collector

AWS otel setup

Github repo

Github-contrib repo

Contrib installation

OLTP configuration

To understand configuration better

--

--

Chaitanya Solasa

Senior DevOps Engineer who works on the ever-changing DevOps stack who is here to help the tech community who faced issues just like me ! So yeah go ahead