Securing your OpenTelemetry Collector

Photo by ALLAN FRANCA CARMO from Pexels

When exposing your OpenTelemetry Collector receivers to a public network, it might be a good idea to restrict who can send data to it, so that unauthorized users won’t be able to write spans to your backend.

The typical approach for securing microservices nowadays is to place an authentication proxy in front of your services, which is then able to inspect the incoming HTTP request and validate the credentials before handing over the control to the target application.

Applying this pattern to the OpenTelemetry Collector works when we deal with receivers that expose a regular HTTP port. However some receivers, such as OTLP and Jaeger also provide a gRPC endpoint with the promise of better performance. Given that gRPC connections are typically long-lived, the HTTP authentication could potentially become obsolete while the connection is still open. A more secure solution would be to validate the authentication data on a per-RPC basis. Unfortunately, generic authentication proxies for gRPC aren’t that common or popular yet.

With that in mind, the OpenTelemetry Collector provides settings for the gRPC receivers to enable authentication of incoming RPCs. At the moment, only token-based authentication is supported, which is the most common authentication mechanism for machine-to-machine communication. To validate the token, an OpenID Connect, or rather, an OAuth 2.0 Authorization Server is required.

In this blog post, we’ll configure two OpenTelemetry Collectors: one that acts as an agent, and another that acts as a remote collector.

The agent represents a process that sits close to the instrumented application, on the same trusted network like localhost. Because of that, the authentication on this receiver isn’t enabled. On Kubernetes, this would typically be a sidecar or a DaemonSet.

The remote collector represents a process that runs in a different node, cluster, data center or even region. It can be scaled individually and is meant to be accessed only by authorized agents, requiring an authentication token to be provided before processing the spans.

Preparing the authentication server

Keycloak is a popular open source authentication server and implements the OpenID Connect protocol, making it suitable for our purposes. Note however that any server compatible with OpenID Connect will do.

To get Keycloak running, we start it in a container:

Once Keycloak is up and running, head over to http://localhost:8080/auth/admin/ , using “admin” as both the username and password, as specified in the command above.

We’ll need a realm, which we’ll name opentelemetry, like in the following screenshot.

New realm “opentelemetry”

We’ll need to create a new client for the “collector”. When a new token is created for the agent, it should be created with this client as the target “audience” (aud field in the token).

The new “collector” client

We’ll then need a new client, representing our agent:

New client “agent”

We’ll need to change the Access Type to “confidential”, and enable the “Service Accounts” option, like in the following screenshot. We then disable the “Standard Flow”, as we don’t need an end-user authentication via browser.

Our “agent” client configuration

Once our client is saved, the “Credentials” tab should appear, from which we’ll copy the “Secret” value:

The credentials for our client

And finally, we add the collector client as the audience for the tokens issued for our agent client. In the “Mappers” tab, create a new mapper of “Audience” type:

Set the audience for agent tokens

At this point, we are ready to issue an access token, which we can do with the following cURL command. Make sure you replace the client_secret with the secret from the previous step.

Save this token, we’ll need it soon! If you are curious about what’s inside this token, head over to https://jwt.io and paste your token there in the “Encoded” field. Hint: check the exp field for the token. By default, Keycloak tokens are valid only for 5 minutes, but you can change the “Access Token Lifespan” under the client’s “Advanced Settings” for a token that is valid for longer periods.

TLS certificates

We have our authentication server ready and we have our token. But before we configure our collector, we’ll need to generate some TLS certificates: the OpenTelemetry Collector will refuse to authenticate over plain-text connections.

For our exercise here, we’ll use cfssl , with the following certificate signing requests (CSRs):

/tmp/certs/ca-csr.json
/tmp/certs/cert-csr.json, which coincidentally looks identical to the ca-csr.json

With the files in place, we can now generate the certificates:

At the end of this, we should have a ca.pem, cert.pem and cert-key.pem that we’ll need to configure the collector.

Collector configuration

For the agent, we have a pipeline that opens a regular OTLP receiver. As this is deployed alongside the instrumented application, possibly as a sidecar, we won’t require authentication on the receiver side.

On the exporter side, we want to tell the remote collector that we are authorized, so, we include the token we obtained from Keycloak.

The configuration for our agent

On the remote collector side, we expect it to be used only by agents, so, we expose only the OTLP receiver and we set it to require authentication, using our Keycloak as the server that is checking whether the token is valid.

The configuration for our collector

Before we can start our collectors, we’ll need to start the backing Jaeger instance:

We are now ready to start our remote OpenTelemetry Collector:

Once we see the message “Everything is ready. Begin running and processing data”, we’ll know that the server is properly configured, with the exception perhaps of the bearer token, which is only validated when we process spans.

We now start our agent, using a different metrics address, as the default one is already in use by the remote collector:

Again, once the message “Everything is ready. Begin running and processing data” appears, we know we are ready to test it!

Testing

We need an application that is able to generate traces. If you don’t have an instrumented application yet, you can use the tracegen utility, as follows:

This will generate one trace with two spans and send to a local OpenTelemetry Collector process (localhost:55680), which is our agent. The agent will export the spans to both the logging exporter and to the remote collector. The following should be seen in the logs:

Our remote collector is also configured to export spans to the logging exporter, in addition to Jaeger. We should also see the same messages in the logs:

On our Jaeger logs, we should see this:

And finally, on the Jaeger UI, we should be able to find our trace.

Our trace from the ‘tracegen’

Once we change our config.agent.yaml to use an invalid token, we’ll see an error message in the OpenTelemetry Collector logs, and no new traces should be present in Jaeger. The error message will be like this:

Wrapping up

We’ve seen that adding authentication to the gRPC receivers from the OpenTelemetry Collector isn’t difficult: it just requires a properly configured authentication server and the possession of a bearer token.

That said, authentication in OpenTelemetry Collector is relatively new, with a handful of items on the wish list, including caching of the tokens so that a roundtrip to the authentication server isn’t necessary for every RPC, auto-refresh of the tokens on the client side, or perhaps alternative authentication mechanisms. And this is where we need your help: give us your feedback, tell us what’s important for your use-case, so that we implement only the features that are needed!

OpenTelemetry

OpenTelemetry makes robust, portable telemetry a built-in…

OpenTelemetry

OpenTelemetry makes robust, portable telemetry a built-in feature of cloud-native software, and is the next major version of both OpenTracing and OpenCensus.

Juraci Paixão Kröhling

Written by

Juraci Paixão Kröhling is a software engineer at Grafana Labs, a maintainer on the Jaeger project, and a contributor to the OpenTelemetry project.

OpenTelemetry

OpenTelemetry makes robust, portable telemetry a built-in feature of cloud-native software, and is the next major version of both OpenTracing and OpenCensus.