Securing your OpenTelemetry Collector

Published in

OpenTelemetry

7 min readNov 13, 2020

When exposing your OpenTelemetry Collector receivers to a public network, it might be a good idea to restrict who can send data to it, so that unauthorized users won’t be able to write spans to your backend.

This blog post has been updated to work with the OpenTelemetry Collector v0.92.0.

The typical approach for securing microservices nowadays is to place an authentication proxy in front of your services, which is then able to inspect the incoming HTTP request and validate the credentials before handing over the control to the target application.

Applying this pattern to the OpenTelemetry Collector works when we deal with receivers that expose a regular HTTP port. However some receivers, such as OTLP and Jaeger also provide a gRPC endpoint with the promise of better performance. Given that gRPC connections are typically long-lived, the HTTP authentication could potentially become obsolete while the connection is still open. A more secure solution would be to validate the authentication data on a per-RPC basis. Unfortunately, generic authentication proxies for gRPC aren’t that common or popular yet.

With that in mind, the OpenTelemetry Collector provides settings for the gRPC receivers to enable authentication of incoming RPCs. One of the most common techniques is to get a bearer token on the client side (agent) and send it over to the server side (collector). To validate the token, an OpenID Connect, or rather, an OAuth 2.0 Authorization Server is required.

In this blog post, we’ll configure two OpenTelemetry Collectors: one that acts as an agent, and another that acts as a remote collector.

The agent represents a process that sits close to the instrumented application, on the same trusted network like localhost. Because of that, the authentication on this receiver isn’t enabled. On Kubernetes, this would typically be a sidecar or a DaemonSet.

The remote collector represents a process that runs in a different node, cluster, data center or even region. It can be scaled individually and is meant to be accessed only by authorized agents, requiring an authentication token to be provided before processing the spans.

Preparing the authentication server

Keycloak is a popular open source authentication server and implements the OpenID Connect protocol, making it suitable for our purposes. Note however that any server compatible with OpenID Connect will do.

To get Keycloak running, we start it in a container:

docker run -p 8080:8080 -e KEYCLOAK_ADMIN=admin -e KEYCLOAK_ADMIN_PASSWORD=admin quay.io/keycloak/keycloak:23.0.4 start-dev

Once Keycloak is up and running, head over to http://0.0.0.0:8080/admin/master/console/ , using “admin” as both the username and password, as specified in the command above.

We’ll need a realm, which we’ll name opentelemetry, like in the following screenshot.

We’ll need to create a new client for the “collector”. Other than the client ID, don’t change any of the defaults. When a new token is created for the agent, it should be created with this client as the target “audience” (aud field in the token).

We’ll then need a new client, representing our agent:

We’ll need to change the “Client authentication” to “on”, so that the access type is set to “confidential”, and enable the “Service accounts roles” option, like in the following screenshot. We then disable the “Standard Flow”, as we don’t need an end-user authentication via browser.

On the “Credentials” tab, we’ll copy the “Secret” value:

And finally, we add the collector client as the audience for the tokens issued for our agent client. First, we create a new Client scope:

Then, on the “Mappers” tab, click on “Add mapper”, and then “by configuration”, selecting “Audience” from the list that opens:

Select “audience” as the new mapper type

Our new mapper should have the name “collector-as-audience”, and select “collector” from the “Include Client Audience” options:

Now, let’s go back to the client we created and add the new client scope to it:

At this point, we are ready to configure both our agent and collector to issue or validate tokens generated by Keycloak.

TLS certificates

We have our authentication server ready. But before we configure our agent and collector, we’ll need to generate some TLS certificates: the OpenTelemetry Collector will refuse to authenticate over plain-text connections.

For our exercise here, we’ll use cfssl , with the following certificate signing requests (CSRs):

/tmp/certs/ca-csr.json

/tmp/certs/cert-csr.json, which coincidentally looks identical to the ca-csr.json

With the files in place, we can now generate the certificates:

$ cfssl genkey -initca ca-csr.json | cfssljson -bare ca
$ cfssl gencert -ca ca.pem -ca-key ca-key.pem cert-csr.json | cfssljson -bare client
$ cfssl gencert -ca ca.pem -ca-key ca-key.pem cert-csr.json | cfssljson -bare server

At the end of this, we should have a ca.pem, client.pem and client-key.pemas well as server.pemand server-key.pem, that we’ll need to configure the collector.

Collector configuration

For the agent, we have a pipeline that opens a regular OTLP receiver. As this is deployed alongside the instrumented application, possibly as a sidecar, we won’t require authentication on the receiver side.

On the exporter side, we want every outgoing connection to include a token that should be obtained from Keycloak. We can use the oauth2client extension to obtain those tokens for us, and we refer this authenticator as part of our exporter’s configuration:

Agent’s configuration

On the remote collector side, we expect it to be used only by agents, so, we expose only the OTLP receiver and we set it to require authentication, using our Keycloak as the server that is checking whether the token is valid.

Collector’s configuration

We are now ready to start our remote OpenTelemetry Collector:

otelcol --config config.collector.yaml

Once we see the message “Everything is ready. Begin running and processing data”, we’ll know that the server is properly configured, with the exception perhaps of the auth server endpoint, which is only validated when we process spans.

We now start our agent. Make sure the configuration has a different metrics address port if you are running both the agent and collector on the same node:

otelcol --config config.agent.yaml

Again, once the message “Everything is ready. Begin running and processing data” appears, we know we are ready to test it!

Testing

We need an application that is able to generate traces. If you don’t have an instrumented application yet, you can use the telemetrygen utility, as follows:

telemetrygen traces --otlp-insecure --otlp-endpoint localhost:5317

This will generate one trace with two spans and send to a local OpenTelemetry Collector process (localhost:5317), which is our agent. The agent will export the spans to both the debug exporter and to the remote collector. The following should be seen in the logs:

TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}

Our remote collector is also configured to export spans to the debug exporter. We should also see the same messages in the logs:

TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}

We can check our /metrics to validate that we received and exported the spans. Our agent’s metrics live at http://localhost:8988/metrics , while our collector’s live at http://localhost:8888/metrics .

Here are the metrics to check on the agent side:

otelcol_receiver_accepted_spans{receiver="otlp",service_instance_id="f0f27def-4fe0-4a96-804e-c43bef2b06d8",service_name="otelcol-contrib",service_version="0.92.0",transport="grpc"} 2
otelcol_exporter_sent_spans{exporter="otlp/secure",service_instance_id="f0f27def-4fe0-4a96-804e-c43bef2b06d8",service_name="otelcol-contrib",service_version="0.92.0"} 2

On the collector side, here’s what to expect:

otelcol_receiver_accepted_spans{receiver="otlp/secure",service_instance_id="3a752568-b36c-49e4-9810-21ac7c066b2b",service_name="otelcol-contrib",service_version="0.92.0",transport="grpc"} 2
otelcol_exporter_sent_spans{exporter="debug",service_instance_id="3a752568-b36c-49e4-9810-21ac7c066b2b",service_name="otelcol-contrib",service_version="0.92.0"} 2

Once we change our config.agent.yaml to use an invalid secret, we’ll see an error message in the logs, and counters related to the “secure” components should be 0. The error message will be like this:

2024-01-12T17:01:33.797+0100    error   exporterhelper/retry_sender.go:102      Exporting failed. The error is not retryable. Dropping data.    {"kind": "exporter", "data_type": "traces", "name": "otlp/secure", "error": "Permanent error: rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: failed to get security token from token endpoint (endpoint \"http://0.0.0.0:8080/realms/opentelemetry/protocol/openid-connect/token\"); oauth2: \"unauthorized_client\" \"Invalid client or Invalid client credentials\"", "dropped_items": 2}

Wrapping up

We’ve seen that adding authentication to the gRPC receivers from the OpenTelemetry Collector isn’t difficult; it just requires a properly configured authentication server.