Simple observability for Cloud Run applications with GCP and OTLP

Adrian Trzeciak 🇳🇴
Google Cloud - Community

--

If you haven’t come across OpenTelemetry before, let me introduce you to this nifty piece of software, as described on its official website, opentelemetry.io. OpenTelemetry is an open-source project that helps us make sense of what’s happening inside our applications by collecting useful data like traces, metrics, and logs. Also attaching the official definition:

OpenTelemetry, also known as OTel for short, is a vendor-neutral open-source Observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, logs. As an industry-standard, it is supported by 40+ observability vendors, integrated by many libraries, services and apps and adopted by a number of end-users.

No matter where your apps are running, as they grow, it’s a good idea to keep a closer eye on them. Ideally, you’d want to have all the data they churn out in one easy-to-reach place. Instead of using multiple client libraries to export the data into different backends, then match up times and details can be a real headache when you’re racing against the clock to fix a problem.

Here’s where OpenTelemetry, or Otel for short, steps in to give us a hand in orchestrating all this. For the purpose of this article, I decided to send the telemetry data into Google Cloud Platform (GCP). Here’s how I did it:

  • For logging, I used the official Google Logging client library for Go since the OpenTelemetry Protocol (OTLP) isn’t quite there yet in supporting this.
  • When it came to traces, I sent them over to Cloud Trace using the Otel collector.
  • And for metrics, I again used the Otel collector to send them over to Cloud Monitoring.

This setup makes sure all the important data from our apps is neatly tucked away in GCP, making it a whole lot easier to dig into what’s going on when troubleshooting or looking to tune performance.

I’ve constructed a rudimentary application in Go (please pardon the simplistic code quality) that initiates an HTTP server on port :8080, and concocts a UUID whenever a request hits the root (/). Within the UUID handler function, I've orchestrated some OpenTelemetry instrumentation to split the execution into spans:

  • A parent span generateUUIDHandler encompasses the entire function execution, acting as the overarching monitoring scope.
  • A child span sleepForRandomTime encapsulates the random sleep period, allowing for a focused insight into this specific operation’s duration.
  • Another child span doGenerateUUID is dedicated to the UUID generation process, offering a clear view into the time it takes to generate a UUID.

This instrumentation structure, with a parent span and nested child spans, provides a clean, hierarchical view into the function’s execution flow, ensuring precise monitoring and tracing of the operations.

To funnel telemetry data to GCP, we’ll inject a sidecar alongside our Cloud Run service. In this setup, we’ll channel data to a collector via localhost utilizing the HTTP protocol.

The configuration outlined entails the following components:

  • A receiver denoted as otlp that is configured to support the HTTP protocol, acting as the ingress point for telemetry data.
  • An exporter tailored for the Google Cloud backend, associated with a specified project ID, facilitating the onward transmission of telemetry data to GCP.
  • A processor set up for batch processing, optimizing the handling of incoming telemetry by aggregating telemetry data before forwarding it to the designated exporter.

For an efficient and enjoyable development workflow, we’ll leverage Skaffold, which as of now supports Cloud Run in beta. Utilizing Skaffold will streamline the build and deployment processes. To orchestrate this, two files are required:

  • A file embodying the definition of the Cloud Run service, encapsulating the necessary configurations and specifications for the service deployment.
  • A file instructing Skaffold on the build and deployment mechanics of the application, defining the pipeline from code compilation to deployment onto Cloud Run.

Lets start with the Cloud Run service:

Key elements included in the YAML configuration include:

  • A secret, sourced from Secret Manager encompassing the OpenTelemetry configuration, is mounted as a file at the path /etc/otelcol-contrib/config.yaml.
  • A sidecar container, instantiated from the otel/opentelemetry-collector-contrib image, exposes an HTTP endpoint to the uuidgenerator container via port 4318.
  • This configuration facilitates the uuidgenerator in transmitting telemetry data to the OpenTelemetry collector by interfacing with localhost:4318.

Let’s now take a look at the config of skaffold.yaml:

The main sections within the skaffold.yaml include:

  • The manifests block, which demarcates the location of the YAML file encapsulating the Cloud Run configuration, thereby instructing Skaffold on the service specifications.
  • The build block, defining the image construction procedure and the repository destination for pushing the docker image, thus delineating the build and distribution mechanism.
  • The deploy block, detailing the deployment strategy for the application. Included within is a hook that provisionally grants unauthorized access for curl operations against the application, aligning with the deployment requirements.

Upon executing a curl request to the Cloud Run URL, the application responds with a UUID while concurrently dispatching telemetry data into Google’s Observability stack. Post this interaction, metrics pertaining to the operation should now be accessible within the Metrics Explorer, categorized under Generic Node — uuid.duration, thereby providing visibility into the operational metric associated with UUID generation.

Within Cloud Trace, a flame chart illustrates the traversal of logic through various spans, providing a visual representation of execution flow. Additionally, logs pertinent to each span are attached, offering a detailed log trail for in-depth examination and correlation of events across the spans.

The entire codebase and configuration details are hosted on GitHub. If this walkthrough sparks questions or if you’re keen on sharing insights, feel free to reach out either through LinkedIn or Google Cloud Community Slack — whether it’s about this article or the broader landscape of observability.

--

--