Building an Observability Stack with Docker

Daniel Baptista Dias
Kubeshop
11 min readApr 9, 2024

--

When developing an application with observability, one challenge is to set up a minimal local infrastructure to validate if everything is running correctly. Typically, developers code observability features locally but connect them to an external infrastructure, such as a test environment or a quality assurance environment.

This article will showcase how to set up an observability stack locally. You will learn how to:

  1. Configure Grafana, Tempo, Prometheus, and OpenTelemetry Collector with Docker Compose.
  2. Run the observability stack locally using Docker and Docker Compose.
  3. Instrument a simple API to send metrics and traces to the observability stack.
  4. Visualize metrics and traces emitted by APIs.

If you want to see the code example right away, check it out on GitHub, here.

You can also clone the example and run it right away.

git clone https://github.com/kubeshop/tracetest.git
cd tracetest/examples/observability-stack

To start the example, run these commands.

# run the observability stack 
docker compose up -d
# install dependencies and run API
npm install
npm run with-telemetry
# then open a new terminal window and install Tracetest CLI:
# https://docs.tracetest.io/getting-started/installation#install-the-tracetest-cli
# configure Tracetest CLI
tracetest configure
# export API Key
export TRACETEST_API_KEY={API Key from app.tracetest.io}
# run Tracetest Agent
docker compose -f ./docker-compose.yaml -f docker-compose.tracetest.yaml up -d

Setting up an Observability Stack

First, you will set up a folder called observability-stack containing the docker-compose.yaml file, where all the containers will be defined, as well as additional configuration files for each tool. Create a docker-compose.yaml file in this folder and add the following content:

version: "3.7"

services:
# ...

After that, you will set up a metrics server container. It will use Prometheus.io, an open-source monitoring and alerting toolkit designed to collect, store, and query time series data, making it a tool for monitoring your systems’ performance and health through metrics.

You will create a folder called config inside observability-stack and then create a prometheus.config.yaml file inside it with the following contents to configure Prometheus to scrape metrics every 15 seconds from the OpenTelemetry Collector. Prometheus will also send traces to the OpenTelemetry Collector.

global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: otel-collector
static_configs:
- targets: ['otel-collector:8889']
- targets: ['otel-collector:8888']

tracing:
endpoint: otel-collector:4317
insecure: true

After that, you can define the Prometheus container in our docker-compose.yaml to start with this config file using the following definition:

version: "3.7"

services:
prometheus:
image: prom/prometheus:v2.49.1
command:
- --config.file=/etc/prometheus.yaml
- --web.enable-remote-write-receiver
- --enable-feature=exemplar-storage
volumes:
- type: bind
source: ./config/prometheus.config.yaml
target: /etc/prometheus.yaml

With the metrics server set up, you will now set up our tracing backend server. To do that, it will use Tempo, a distributed tracing system that allows you to capture and analyze traces to gain insights into the performance and behavior of your applications. You will set up a tempo.config.yaml inside the observability-stack/config folder, configuring Tempo to receive OTLP data and submit metrics about its internal state. The content of the file is:

stream_over_http_enabled: true

server:
http_listen_port: 3200
log_level: info

query_frontend:
search:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
trace_by_id:
duration_slo: 5s

distributor:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317

ingester:
max_block_duration: 5m # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normally

compactor:
compaction:
block_retention: 1h # overall Tempo trace retention. set for demo purposes

metrics_generator:
registry:
external_labels:
source: tempo
cluster: docker-compose
storage:
path: /tmp/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true

storage:
trace:
backend: local # backend configuration to use
wal:
path: /tmp/tempo/wal # where to store the the wal locally
local:
path: /tmp/tempo/blocks

overrides:
defaults:
metrics_generator:
processors: [service-graphs, span-metrics] # enables metrics generator

As with Prometheus, you will define a Tempo container in docker-compose.yaml:

version: "3.7"

services:
tempo:
image: grafana/tempo:2.3.1
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- type: bind
source: ./config/tempo.config.yaml
target: /etc/tempo.yaml

prometheus:
# ...

With both metrics and trace servers defined, you will define an OpenTelemetry Collector to orchestrate the reception of OpenTelemetry data by our application, allowing us to centralize how it is sent OTLP data following this architecture:

In the observability-stack/config folder, you will add a file called otel-collector.config.yaml with some configuration to receive telemetry data and to forward (export) it to Tempo and Prometheus.

To receive OTLP data, you set up the standard otlp receiver to receive data in HTTP or gRPC format. To forward traces and metrics, a batch processor was defined to accumulate data and send it every 100 milliseconds. Then set up a connection to Tempo (in otlp/tempo exporter, with a standard top exporter) and to Prometheus (in prometheus exporter, with a control exporter). A debug exporter also was added to log info on container standard I/O and see how the collector is working.

The final config file is structured as follows:

receivers:
otlp:
protocols:
grpc:
http:
cors:
allowed_origins:
- "http://*"
- "https://*"
processors:
batch:
timeout: 100ms
exporters:
debug:
verbosity: detailed
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
extensions:
health_check: {}
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [debug, prometheus]
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug, otlp/tempo]

Then, you will add an OpenTelemetry Collector container in docker-compose.yaml to start the service with the infrastructure needed to collect and store telemetry data:

version: "3.7"

services:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.92.0
command:
- "--config"
- "/otel-local-config.yaml"
volumes:
- ./config/otel-collector.config.yaml:/otel-local-config.yaml
ports:
- 4317:4317
tempo:
# ...
prometheus:
# ...

With this stack complete, you can use it to collect telemetry. However, it is difficult to visualize it clearly since Tempo and Prometheus store traces and metrics and provide low-level API to view them.

So, you will add one last container to allow us to visualize this data: Grafana, an open-source analytics and visualization platform that allows us to see traces and metrics simply. You can set Grafana to read data from both Tempo and Prometheus by setting them as datastores with the following grafana.datasource.yaml config file:

# config file version
apiVersion: 1

datasources:
- name: Prometheus
type: prometheus
uid: prometheus
access: proxy
orgId: 1
url: http://prometheus:9090
basicAuth: false
isDefault: false
version: 1
editable: false
jsonData:
httpMethod: GET
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo:3200
basicAuth: false
isDefault: true
version: 1
editable: false
apiVersion: 1
uid: tempo
jsonData:
httpMethod: GET
serviceMap:
datasourceUid: prometheus

After that, you can define a Grafana container on our docker-compose.yaml:

version: "3.7"

services:
grafana:
image: grafana/grafana:10.2.3
user: "472"
depends_on:
- prometheus
- tempo
- otel-collector
ports:
- 33000:33000
environment:
- GF_SERVER_HTTP_PORT=33000
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
- GF_AUTH_DISABLE_LOGIN_FORM=true
volumes:
- type: bind
source: ./config/grafana.datasource.yaml
target: /etc/grafana/provisioning/datasources/datasources.yaml
otel-collector:
# ...
tempo:
# ...
prometheus:
# ...

With all pieces configured, you can run an app and submit telemetry to check if everything works.

Running an App Emitting Telemetry Against Our Observability Stack

To test the telemetry, you will create a simple API in Node.js with a single endpoint that returns “Hello World” when called in a app.js file:

const express = require("express")
const app = express()

app.get("/", (req, res) => {
setTimeout(() => {
res.send("Hello World")
}, 1000);
})
app.listen(8080, () => {
console.log(`Listening for requests on http://localhost:8080`)
})

Then, you will create a file that manages all OpenTelemetry Instrumentation for this API, called app.instrumentation.js, that instruments the API calls with traces and metrics and sends it to our OpenTelemetry Collector exposed on localhost:4317:

const opentelemetry = require('@opentelemetry/sdk-node')
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node')
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc')
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-grpc')
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics')
const grpc = require('@grpc/grpc-js')

const exporterConfig = {
url: 'localhost:4317',
credentials: grpc.ChannelCredentials.createInsecure()
}
const sdk = new opentelemetry.NodeSDK({
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter(exporterConfig)
}),
traceExporter: new OTLPTraceExporter(exporterConfig),
instrumentations: [getNodeAutoInstrumentations()],
serviceName: 'test-api',
})
sdk.start()

And finally, create a package.json file with API dependencies:

{
"name": "test-api",
"version": "1.0.0",
"main": "app.js",
"scripts": {
"with-telemetry": "node --require ./app.instrumentation.js app.js"
},
"dependencies": {
"@opentelemetry/api": "^1.7.0",
"@opentelemetry/auto-instrumentations-node": "^0.41.0",
"@opentelemetry/exporter-metrics-otlp-grpc": "^0.48.0",
"@opentelemetry/exporter-trace-otlp-grpc": "^0.48.0",
"express": "^4.18.2"
}
}

Start the observability stack with Docker Compose and the API by running the commands below. Note that the dependencies are being installed, and the API is listening on port 8080.

# run our Observability stack 
docker compose up -d

# install dependencies and run API
npm install
npm run with-telemetry
# outputs
> test-api@1.0.0 with-telemetry
> node --require ./app.instrumentation.js app.js
Listening for requests on http://localhost:8080

By running a command to trigger the API in another terminal, you will be able to see its response:

curl http://localhost:8080/

# outputs
Hello World

And seeing the OpenTelemetry Collector logs in Docker with docker compose logs otel-collector, you should be able to see the API sending metrics and traces to it by seeing logs like:

# ...
# metrics logs
otel-collector-1 | InstrumentationScope @opentelemetry/instrumentation-http 0.48.0
otel-collector-1 | Metric #0
otel-collector-1 | Descriptor:
otel-collector-1 | -> Name: http.server.duration
otel-collector-1 | -> Description: Measures the duration of inbound HTTP requests.
otel-collector-1 | -> Unit: ms
otel-collector-1 | -> DataType: Histogram
otel-collector-1 | -> AggregationTemporality: Cumulative
# ...
# trace logs
otel-collector-1 | ScopeSpans #1
otel-collector-1 | ScopeSpans SchemaURL:
otel-collector-1 | InstrumentationScope @opentelemetry/instrumentation-express 0.35.0
otel-collector-1 | Span #0
otel-collector-1 | Trace ID : f31338cf98ec9bcb9a194a3fb092926c
otel-collector-1 | Parent ID : 1adc22218e485dc5
otel-collector-1 | ID : ca0be9c187c7b9fa
otel-collector-1 | Name : middleware - query
otel-collector-1 | Kind : Internal
otel-collector-1 | Start time : 2024-01-29 18:37:46.299 +0000 UTC
otel-collector-1 | End time : 2024-01-29 18:37:46.299479982 +0000 UTC
otel-collector-1 | Status code : Unset
otel-collector-1 | Status message :
otel-collector-1 | Attributes:
otel-collector-1 | -> http.route: Str(/)
otel-collector-1 | -> express.name: Str(query)
otel-collector-1 | -> express.type: Str(middleware)

Now, one last step is to open Grafana in your browser at http://localhost:33000 and start visualizing metrics and traces. You can do it by going to the menu on the initial page and choosing the Explore option:

The Explore screen will open with the Prometheus datastore enabled. If you expand the query looking for the metrics, you can see all metrics published by the stack:

One of them is http_server_duration, emitted by the automatic instrumentation, that counts the number of requests made against our server and measures its duration:

Changing the datastore to Tempo, you can see the traces emitted by the API.

If you add a Trace ID (like the Trace ID f31338cf98ec9bcb9a194a3fb092926c , captured in the logs) and click on Run query, you should be able to see the traces:

Done! You have a local API publishing telemetry to a local stack. Now, you can experiment with the API, add more traces and metrics, and evaluate everything locally.

Bonus: Trace-testing Your App to Automate Telemetry Tests

Now that you have a working API, instead of checking the telemetry manually, you can create trace-based tests, trigger HTTP calls to the API, and validate if the API is working as intended and emitting traces.

To do that, you will use Tracetest, which triggers service calls (in our case, HTTP calls) and validate the emitted traces to ensure that our applications are working as intended and that the telemetry is properly captured and sent to the observability stack.

First, you will add one more container in a separate docker-compose.tracetest.yaml file, which will define a Tracetest Agent container. This container is a lightweight, dependency-free agent that runs locally in your environment and is able to connect to the local tracing backend (in your case, Tempo), and executes API calls locally.

version: "3.7"

services:
tracetest:
image: kubeshop/tracetest-agent:latest
platform: linux/amd64
command:
- --mode
- verbose
depends_on:
otel-collector:
condition: service_started
environment:
TRACETEST_API_KEY: ${TRACETEST_API_KEY}

Then, you will run the observability stack and the API as you did before:

# run our Observability stack 
docker compose up -d

# install dependencies and run API
npm install
npm run with-telemetry

Now, in a new terminal window, you will install the Tracetest CLI using the following instructions for your operating system and execute the command:

tracetest configure

This command will guide you to access Tracetest and set up your account. Then, create a new environment by expanding the environment tab and clicking on Create a New Environment :

On the popup window, enter the name of the environment and click on Create:

On the Get Started screen, choose the option “Application is in a private environment”, since you will connect the Tracetest Agent with our local observability stack in Docker.

Now copy the API Key on the screen to set up our Tracetest Agent in Docker:

You will start the agent with Docker Compose using the command below.

export TRACETEST_API_KEY={API Key copied in last step}
docker compose -f ./docker-compose.yaml -f docker-compose.tracetest.yaml up -d

Choose Tempo as the application that will receive traces, and then enter the endpoint used to access it inside our stack, tempo:9095, and click Test Connection.

The connection will be validated. Click on Continue and then Save.

Now, you can proceed in the terminal and create a test file that calls the API, called test-api.yaml, which will trigger our API from the Tracetest Agent container and will validate if the API call emitted a trace with an HTTP span named GET / :

type: Test
spec:
id: _0N272tIg
name: Test API call
trigger:
type: http
httpRequest:
method: GET
url: http://host.docker.internal:8080/
headers:
- key: Content-Type
value: application/json
specs:
- selector: span[tracetest.span.type="http" name="GET /" http.target="/" http.method="GET"]
name: HTTP call was made correctly
assertions:
- attr:http.status_code = 200

Finally, you can run this test with Tracetest CLI and validate the API in the terminal:

tracetest run test -f ./test-api.yaml

# it return an output like this:
✔ Test API call (https://app.tracetest.io/organizations/your-organization/environments/your-environment/test/_0N272tIg/run/2/test) - trace id: 399568f5f202656ab926f1b1452d5dbd
✔ HTTP call was made correctly

With this, you can validate the API with each change to guarantee that the telemetry is valid and returning everything as expected.

Final Remarks

Setting up an observability stack in Docker for local development can greatly enhance the ability to monitor and analyze the performance and behavior of applications. Also, with a local stack, a developer can iterate quickly when developing an API with easier code changes and adding more telemetry data.

Additionally, utilizing trace-based tests with Tracetest can automate the validation of telemetry and ensure that the application is functioning as intended. Overall, having an observability stack during local development can improve the development and testing process, leading to more reliable and efficient applications.

Would you like to learn more about Tracetest and what it brings to the table? Visit the Tracetest docs and try it out by signing up today!

Also, please feel free to join our Slack Community, give Tracetest a star on GitHub, or schedule a time to chat 1:1.

--

--

Daniel Baptista Dias
Kubeshop

Software Engineer at Kubeshop and doctorate student at University of São Paulo