PART III-Comprehensive guide in setting up the three pillars of observability in Kubernetes cluster within AWS private network
Set up traces using Grafana and Tempo in Observability cluster and Grafana agent in tenant cluster
Document is published in 3 parts, PART I for metrics, PART II for logs and PART III for traces
Part III: Setting up traces
Tools explained
Grafana: To analyse and visualize logs. Grafana already installed in PART 1
Tempo: Highly scalable distributed tracing backend with object storage support
Grafana agent: Agent which collects and ships traces from application cluster to Tempo installed in observability cluster.
Hotrod application: Sample application to generate traces for testing purposes.
Set up Tempo in Observability cluster using helm
Create tempo-values.yaml file (make sure multitenancy is enabled and always use secret env values to pass aws key/id in production. Adding it directly in the tutorial to keep it simple)
apiVersion: v1
data:
overrides.yaml: |
overrides:
{}
tempo.yaml: |
multitenancy_enabled: true
usage_report:
reporting_enabled: true
compactor:
compaction:
block_retention: 24h
distributor:
receivers:
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_binary:
endpoint: 0.0.0.0:6832
thrift_compact:
endpoint: 0.0.0.0:6831
thrift_http:
endpoint: 0.0.0.0:14268
opencensus: null
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
ingester:
{}
server:
http_listen_port: 3100
storage:
trace:
backend: s3
s3:
endpoint: s3.us-east-1.amazonaws.com
bucket: bibin-tempo-data
forcepathstyle: true
#set to true if endpoint is https
insecure: true
access_key: “AXXXXXXXXXXL”
secret_key: “xxxxxxxxxxxxxxxxxxxxxxxxxxxH”
wal:
path: /var/lib/enterprise-traces/wal
querier:
{}
query_frontend:
{}
overrides:
per_tenant_override_config: /conf/overrides.yaml
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: tempo
meta.helm.sh/release-namespace: tempo
creationTimestamp: “2023–03–06T03:58:29Z”
labels:
app.kubernetes.io/instance: tempo
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: tempo
app.kubernetes.io/version: 2.0.1
helm.sh/chart: tempo-1.0.1
name: tempo
namespace: tempo
resourceVersion: “157004”
uid: 3612c55c-cb0e-4528–9ce4-bfc49829dd8d
Install Grafana tempo using helm
Kubectl create ns tempo
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade — install tempo grafana/tempo -n tempo — values tempo-values.yaml
If all is fine, you will see below output
NAME: tempo
LAST DEPLOYED: Mon Mar 6 03:58:29 2023
NAMESPACE: tempo
STATUS: deployed
REVISION: 1
TEST SUITE: None
Check if tempo pods are running fine
Expose Tempo service on port 4317 using below command (Grafana agent will be shipping traces to this port)
kubectl expose service tempo — type=LoadBalancer — target-port=4317 — name=tempo-nlb -n tempo — dry-run=client -o yaml > tempo-nlb.yaml
Edit tempo-nlb.yaml and add annotations for NLB so that an internal network load balancer is created
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: tempo
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: tempo
app.kubernetes.io/version: 2.0.1
helm.sh/chart: tempo-1.0.1
name: tempo-nlb
namespace: tempo
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
service.beta.kubernetes.io/aws-load-balancer-internal: “true”
spec:
ports:
— name: pushport
port: 4317
protocol: TCP
targetPort: 4317
selector:
app.kubernetes.io/instance: tempo
app.kubernetes.io/name: tempo
type: LoadBalancer
status:
loadBalancer: {}
Apply it using the below command
k apply -f tempo-nlb.yaml
Find the ELB address
[observe@bastion ~]$ k get svc -n tempo | awk ‘{print $4}’ | grep elb
Output: a0f2479duyrtcfsghkjxxxxxac-a8e26a6b24db8ffc.elb.us-east-1.amazonaws.com
Check above NLB using curl within the vpc (as its an internal NLB)
[observe@portx-observability-bastion ~]$ curl a0f2479duyrtcfsghkjxxxxxac-a8e26a6b24db8ffc.elb.us-east-1.amazonaws.com:4317
Output: Received HTTP/0.9 when not allowed
If you receive above message we can assume its fine
Next Create a VPC endpoint in Observability cluster using the above Tempo NLB
Set up an endpoint in App cluster VPC using above vpce-svc endpoint service name
Check the working on VPC endpoint using curl in application cluster vpc
curl vpce-099ad1fxxxxxx75-vpce-svc-0b32c00d6e1cb4511.us-east-1.vpce.amazonaws.com:4317
Output: Received HTTP/0.9 when not allowed
You get the similar message which means endpoint is fine
Set up Grafana Agent using in application cluster.
Kubectl create ns grafana-agent;
MANIFEST_URL=https://raw.githubusercontent.com/grafana/agent/v0.27.0/production/kubernetes/agent-traces.yaml NAMESPACE=grafana-agent /bin/sh -c “$(curl -fsSL https://raw.githubusercontent.com/grafana/agent/v0.27.0/production/kubernetes/install-bare.sh)" | kubectl apply -f -
Create file agent-traces-config.yaml or simply edit the current configmap and edit the values.
Make sure to use the VPC endpoint DNS as the tempo endpoint address (Agent ships traces to this address)
kind: ConfigMap
metadata:
name: grafana-agent-traces
apiVersion: v1
data:
agent.yaml: |
traces:
configs:
— batch:
send_batch_size: 1000
timeout: 5s
name: default
receivers:
jaeger:
protocols:
grpc: null
thrift_binary: null
thrift_compact: null
thrift_http: null
remote_sampling:
strategy_file: /etc/agent/strategies.json
tls:
insecure: true
opencensus: null
otlp:
protocols:
grpc: null
http: null
zipkin: null
remote_write:
— basic_auth:
password: awsome
username: bibink
endpoint: vpce-099ad1fxxxxxx75-vpce-svc-0b32c00d6e1cb4511.us-east-1.vpce.amazonaws.com:4317
insecure: true
headers:
X-Scope-OrgID: bibinNet
retry_on_failure:
enabled: false
scrape_configs:
— bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: kubernetes-pods
kubernetes_sd_configs:
— role: pod
relabel_configs:
— action: replace
source_labels:
— __meta_kubernetes_namespace
target_label: namespace
— action: replace
source_labels:
— __meta_kubernetes_pod_name
target_label: pod
— action: replace
source_labels:
— __meta_kubernetes_pod_container_name
target_label: container
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
strategies.json: ‘{“default_strategy”: {“param”: 0.001, “type”: “probabilistic”}}’
Apply the above file
kubectl -n grafana-agent apply -f agent-traces-config.yaml
Verify Grafana-agent pods are running
k get pods -n grafana-agent
Add datasource in Grafana
Note: Using X-Scope-OrgID and authentication to match values in agent-traces-config.yaml
Now you go to Grafana Explore and run some query. You can see traces if you already have an app generating traces.
But for me no application was there which could generate traces in the tenant/application cluster.
And so I deployed the hotrod application to generate traces for testing.
How to deploy hotrod application
As the first step install compose in your bastion host. Kompose will be used to convert docker-compose to Kubernetes manifest files.
For installation see https://kompose.io/installation/
Finally clone the repo for hotroad and convert it using compose
git clone https://github.com/jaegertracing/jaeger.git
cd jaeger/examples/hotrod
kompose convert
Output will be as below
INFO Kubernetes file “hotrod-service.yaml” created
INFO Kubernetes file “jaeger-service.yaml” created
INFO Kubernetes file “hotrod-deployment.yaml” created
INFO Kubernetes file “hotrod-jaeger-example-networkpolicy.yaml” created
INFO Kubernetes file “jaeger-deployment.yaml” created
We are interested only in “hotrod-service.yaml” and “hotrod-deployment.yaml” files
Edit “hotrod-deployment.yaml” and add the env variables. It should point to the Grafana agent we set up earlier
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.28.0 (c4137012e)
creationTimestamp: null
labels:
io.kompose.service: hotrod
name: hotrod
spec:
replicas: 1
selector:
matchLabels:
io.kompose.service: hotrod
strategy: {}
template:
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.28.0 (c4137012e)
creationTimestamp: null
labels:
io.kompose.network/hotrod-jaeger-example: “true”
io.kompose.service: hotrod
spec:
containers:
— args:
— all
env:
— name: JAEGER_AGENT_HOST
value: grafana-agent-traces.grafana-agent.svc
— name: JAEGER_AGENT_PORT
value: “6831”
— name: JAEGER_SAMPLER_TYPE
value: const
— name: JAEGER_SAMPLER_PARAM
value: “1”
— name: JAEGER_TAGS
value: app=hotrod
— name: OTEL_EXPORTER_JAEGER_ENDPOINT
value: http://grafana-agent-traces.grafana-agent.svc:14268/api/traces
image: jaegertracing/example-hotrod:latest
name: hotrod
ports:
— containerPort: 8080
resources: {}
restartPolicy: Always
status: {}
Edit “hotrod-service.yaml” and modify it to use LoadBalancer
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
io.kompose.service: hotrod
name: hotrod-lb
spec:
ports:
— port: 8080
protocol: TCP
targetPort: 8080
selector:
io.kompose.service: hotrod
type: LoadBalancer
status:
loadBalancer: {}
Apply the manifests
Kubectl apply -f hotrod-deployment.yaml
Kubectl apply -f hotrod-service.yaml
Find the ALB url
[user@bastion ~]$ kubectl get svc hotrod-lb | awk ‘{print $4}’ | grep elb
a7f2fe4b4exxxxxxxxx0c71334–105225129.us-east-1.elb.amazonaws.com
Access above ELB url via browser on port 8080 and click to generate traces
Then tail the hotrod pod for logs and see if traces are being generated
kubectl logs hotrod-655768f69b-lr77p | grep -i trace | tail -5
Now login to Grafana and see if traces are being generated
You can see traces which shows set up is working fine.
Optional steps
Now to make things easier, add derived fields to Loki datasource so that you can directly link traces to tempo (no need to copy and paste in tempo query field)
Use below values
Name: TraceID
Query: ${__value.raw}
Regex: ((\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+)(\d+|[a-z]+))
Internal Link: Toggle ON and select Tempo DS from dropdown
Now go to Grafana Explore, select Loki from datasource and execute query.
You can now see logs which highlights traces linked to tempo
And that’s it. You now have an observability cluster with logs, traces and metrics enabled which can be visualized and analysed using Grafana
Thanks for reading!