Integrate Open Telemetry with Google Cloud Ops in a Linux Environment

Published in

Google Cloud - Community

9 min readMar 12, 2024

Overview

The OpenTelemetry documentation website outlines three basic patterns users can employ to transmit telemetry data from their applications to a backend.

No Collector

The simplest pattern is to instrument applications using an OpenTelemetry SDK that export telemetry signals (traces, metrics, logs) directly into a backend.

Use an agent

The agent collector deployment pattern consists of applications — instrumented with an OpenTelemetry SDK using OpenTelemetry protocol (OTLP) — or other collectors (using the OTLP exporter) that send telemetry signals to a collector instance running with the application or on the same host as the application (such as a sidecar or a daemonset).

Use a gateway

The gateway collector deployment pattern consists of applications (or other collectors) sending telemetry signals to a single OTLP endpoint provided by one or more collector instances running as a standalone service (for example, a deployment in Kubernetes), typically per cluster, per data center or per region

When possible, you should consider to use OpenTelemetry Collectors since they provide a vendor agnostic solution to receive, process, and export data in a wide range of existing telemetry formats and protocols. OpenTelemetry Collectors also allow you to export telemetry to one or multiple backends, including Google Cloud Operation Suite (Cloud Ops).

For highly reliable and scalable data ingestion, employing the gateway pattern is highly recommended.

Implementation

This tutorial will guide you through the steps of sending telemetry data from a Java Spring Boot application to Cloud Ops, using the open-source OpenTelemetry collector. Once completed, the architecture should resemble the following:

For simplicity, a Debian-based Linux distribution is used as an example. However, you should be able to adopt the pattern for other Linux distributions in any non-Google Cloud environments, such as in an on-premises VMWare environment, or in a different Cloud.

Create the sample Spring Boot Application

Perform the following steps on the Linux server (either the same or a different one) where you intend to run your application:

To install the JDK and the required tools, execute the following commands, if they are not already installed:

sudo apt update && sudo apt upgrade
sudo apt install openjdk-17-jdk unzip maven

2. Create the Java Spring Boot sample Application and download it

curl https://start.spring.io/starter.zip  \
   -d javaVersion=17 -d type=maven-project \
   -d language=java -d platformVersion=3.2.1 \
   -d packaging\=jar \
   -d groupId=com.example \
   -d artifactId=demo \
   -d name=demo \
   -d packageName=com.example.demo \
   -d dependencies=web \
   -o my-project.zip

3. Unzip the content in the java-demo directory

mkdir java-demo && cd java-demo
unzip ../my-project.zip

After run the commands, you should have the following files and directories in your current directory:

HELP.md mvnw mvnw.cmd pom.xml src

4. Create the RollController.java file

cat << EOF > src/main/java/com/example/demo/RollController.java
package com.example.demo;

import java.util.Optional;
import java.util.concurrent.ThreadLocalRandom;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class RollController {
  private static final Logger logger = LoggerFactory.getLogger(RollController.class);

  @GetMapping("/rolldice")
  public String index(@RequestParam("player") Optional<String> player) {
    int result = this.getRandomNumber(1, 6);
    if (player.isPresent()) {
      logger.info("{} is rolling the dice: {}", player.get(), result);
    } else {
      logger.info("Anonymous player is rolling the dice: {}", result);
    }
    return Integer.toString(result);
  }

  public int getRandomNumber(int min, int max) {
    return ThreadLocalRandom.current().nextInt(min, max + 1);
  }
}
EOF

5. Update the DemoApplication.java file

cat << EOF > src/main/java/com/example/demo/DemoApplication.java
package com.example.demo;

import org.springframework.boot.Banner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class DemoApplication {

    public static void main(String[] args) {
        SpringApplication app = new SpringApplication(DemoApplication.class);
        app.setBannerMode(Banner.Mode.OFF);
        app.run(args);
    }
}
EOF

6. Run the application using Maven

./mvnw spring-boot:run

After finishing testing, you can press Control-c to stop the application.

7. Test the application

In a different terminal, send a request using curl:

curl http://localhost:8080/rolldice

In the Maven window, you should have output similar to the following:

……
INFO 325243 - - [nio-8080-exec-5] [65adc697c76d7e3a51343a6fc20274d9–51343a6fc20274d9]
com.example.demo.RollController : Anonymous player is rolling the dice: 5

Install and configure collector

When it comes to collectors, you can utilize the open-source Google Cloud Exporter, and Google Managed Service for Prometheus Exporter. These exporters enable seamless data transmission from your application to the Cloud Ops backend.

To accomplish this, login to the linux server you want to install the collector and do the following.

Check the releases of the collector and download it. For example, the following command will download the v0.92.0 collector to your current directory

curl -L -O https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.92.0/otelcol-contrib_0.92.0_linux_amd64.deb

2. Install the collector

sudo apt install ./otelcol-contrib_0.92.0_linux_amd64.deb

3. Check if the service is running

systemctl status otelcol-contrib.service

You should see output similar to the following:

● otelcol-contrib.service — OpenTelemetry Collector Contrib
Loaded: loaded (/lib/systemd/system/otelcol-contrib.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023–12–22 03:03:04 UTC; 6min ago
Main PID: 354900 (otelcol-contrib)
Tasks: 9 (limit: 19184)
Memory: 41.4M
CPU: 1.038s
CGroup: /system.slice/otelcol-contrib.service
└─354900 /usr/bin/otelcol-contrib - config=/etc/otelcol-contrib/config.yaml
……

4. To update the configuration file, you can either use an editor or run the following command

sudo bash -c 'cat << EOF > /etc/otelcol-contrib/config.yaml
receivers:
  otlp:
    protocols:
      grpc:
      http:
exporters:
  googlecloud:
  googlemanagedprometheus:
processors:
  # change the values as needed
  resource:
    attributes:
    - key: "project_id"
      value: "YOUR-GOOGLE-CLOUD-PROJECT-ID"
      action: upsert
    - key: "location"
      #it needs to be a GCP or AWS location such as us-east1 or aws:us-east-1
      value: "us-central1" 
      action: upsert
    - key: "cluster"
      value: "onprem"
      action: upsert
    - key: "namespace"
      value: ""
      action: upsert
    - key: "job"
      value: "metric_job"
      action: upsert
    - key: "instance"
      value: "test"
      action: upsert
  transform:
    error_mode: ignore
    log_statements:
    - context: log
      statements:
      - set(attributes["gcp.log_name"], resource.attributes["service.name"]) where attributes["gcp.log_name"] == nil  
  memory_limiter:
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20
  batch:
  resourcedetection:
    detectors: [env, system, gcp]
    timeout: 10s
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [googlecloud]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch, resourcedetection, resource]
      exporters: [googlemanagedprometheus]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, transform, batch]
      exporters: [googlecloud]
EOF'

Depending on the platform the application is deployed to, you can use different detectors in the resourcedetection section. For example, you can use [env, ec2] for an application running on an AWS EC2 instance. For more information, you can read this doc.

If you use an agent to collect data for the application and send it to a gateway, the processors , such as resource, detectors and transform, should be present in the agent rather than the gateway collector for high quality telemetry.

5. Create a service account for authentication

Ensure you have the proper permissions and perform the following steps in Cloud Shell opened from the Cloud console. If your platform supports workload identity federation, strongly consider using it instead of service accounts for enhanced security and simplified management.

a) Create the user-managed service account:

export SA_NAME=otel-export-sa
export PROJECT_ID=[Your Project Id]
gcloud iam service-accounts create ${SA_NAME}

Note:

SA_NAME: the name of the service account.
PROJECT_ID: the Cloud Ops destination project ID.

b) Grant the the following roles to the service account:

Metrics: roles/monitoring.metricWriter
Traces: roles/cloudtrace.agent
Logs: roles/logging.logWriter

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
 - member=serviceAccount:${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
 - role=roles/monitoring.metricWriter
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
 - member=serviceAccount:${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
 - role=roles/cloudtrace.agent
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
 - member=serviceAccount:${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
 - role=roles/logging.logWriter

c) Download the service account key file

gcloud iam service-accounts keys create otel-sa-key.json \
 - iam-account=${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com

Note: Keep the key file in a secure location and you may need it if you have to configure multiple collectors.

d) Copy the key file to the service directory

sudo cp otel-sa-key.json /etc/otelcol-contrib/otel-sa-key.json
sudo chown otelcol-contrib:otelcol-contrib /etc/otelcol-contrib/otel-sa-key.json

6. Update the environment for the service configuration file

The exporter will automatically look for the key file using the GOOGLE_APPLICATION_CREDENTIALS environment variable or, if that is unset, one of the other known locations. You can run the following command to add the environment variable to the configuration file.

sudo bash -c 'cat << EOF >> /etc/otelcol-contrib/otelcol-contrib.conf
GOOGLE_APPLICATION_CREDENTIALS=/etc/otelcol-contrib/otel-sa-key.json
EOF'

7. Restart the service

sudo systemctl restart otelcol-contrib.service

8. Check again to ensure the service is running

systemctl status otelcol-contrib.service

Instrumentation

Instrumentation is the act of adding observability code to an app. There are different ways to instrument a Spring Boot application. In this doc, you will use the OpenTelemetry Java agent with byte code instrumentation.

Download the opentelemetry-javaagent.jar.

You can find all the available releases in the GitHub opentelemetry-java-instrumentation repository.

curl -L -O https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.32.0/opentelemetry-javaagent.jar

2. Set and export variables that specify the Java agent JAR and the Google Cloud exporter. If you deploy the application and the collector on the same linux server, you can use 127.0.0.1 or localhost as the endpoint URL. You need to change the IP or the DNS name if they are on different servers.

export JAVA_TOOL_OPTIONS="-javaagent:./opentelemetry-javaagent.jar -Dotel.exporter.otlp.protocol=grpc" \
  OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4317 \
  OTEL_TRACES_EXPORTER=otlp \
  OTEL_METRICS_EXPORTER=otlp \
  OTEL_LOGS_EXPORTER=otlp \
  OTEL_SERVICE_NAME=my-springboot-demo

3. Build and run the Spring Boot application

mvn clean package
java -jar ./target/demo-0.0.1-SNAPSHOT.jar

4. Verify the results. From another terminal, send a request using curl again:

curl localhost:8080/rolldice

You should see the application respond with a random dice number.

5. Verify the telemetry data in Cloud Ops

From Cloud Trace, you should see trace data like the following:

From the Cloud Logging console, you should see application log entries like the following:

From Cloud Monitoring, you can query and visualize the available metrics, for example the following MQL query shows the total memory allocated to the Java virtual machine.

fetch prometheus_target
| metric 'prometheus.googleapis.com/jvm_memory_committed_bytes/gauge'
| group_by 1m,
   [value_jvm_memory_committed_bytes_mean:
      mean(value.jvm_memory_committed_bytes)]
| every 1m
| group_by [],
   [value_jvm_memory_committed_bytes_mean_aggregate:
      aggregate(value_jvm_memory_committed_bytes_mean)]

If you prefer PromQL, you can check the PromQL radio button for query language in the middle of the window. An equivalent PromQL query for the previous example is:

sum(avg_over_time(jvm_memory_committed_bytes[${__interval}]))

Use a Gateway

The gateway collector pattern provides a single OTLP end-point and lets you scale your deployment with multiple backend collectors. The following example will show you a gateway deployment using NGINX as the load balancer to front the collectors. Although NGINX is used in this doc, alternative load balancer solutions can be employed, provided that they offer equivalent capabilities.

To install NGINX, run the following command:

sudo apt install nginx

Configure the NGINX server by creating a otel.conf file under /etc/nginx/conf.d. You can use an editor or run the following command.

sudo bash -c 'cat << EOF > /etc/nginx/conf.d/otel.conf
server {
    listen 14317 http2;
    server_name _;

    location / {
            grpc_pass      grpc://collector4317;
            grpc_next_upstream     error timeout invalid_header http_500;
            grpc_connect_timeout   2;
            grpc_set_header        Host            \$host;
            grpc_set_header        X-Real-IP       \$remote_addr;
            grpc_set_header        X-Forwarded-For \$proxy_add_x_forwarded_for;
    }
}

server {
    listen 14318;
    server_name _;

    location / {
            proxy_pass      http://collector4318;
            proxy_redirect  off;
            proxy_next_upstream     error timeout invalid_header http_500;
            proxy_connect_timeout   2;
            proxy_set_header        Host            \$host;
            proxy_set_header        X-Real-IP       \$remote_addr;
            proxy_set_header        X-Forwarded-For \$proxy_add_x_forwarded_for;
    }
}

upstream collector4317 {
    server collector1:4317;
    # Additional collectors
    # server collector2:4317;
}

upstream collector4318 {
    server collector1:4318;
    # Additional collectors
    # server collector2:4318;
}
EOF'

In the configuration file, the listening ports are changed to 14317 and 14318 in case you run the NGINX server on the same linux box and have a port conflict. You can also add multiple collectors as you wish. If your collector server names are different, you need to replace them or use the IP addresses. For example, to use the IP for server collector1, you can do the following:

# Get the host IP address on the colletor1 host
hostname -i|cut -d' ' -f2
# Replace the IP on the NGINX host with the value you got
sudo sed -i -e "s/collector1/IP/" /etc/nginx/conf.d/otel.conf

Reload NGINX for the new configuration

sudo systemctl reload nginx

After the NGINX server reloads successfully, you can now use it in the OTEL_EXPORTER_OTLP_ENDPOINT:

export JAVA_TOOL_OPTIONS="-javaagent:./opentelemetry-javaagent.jar -Dotel.exporter.otlp.protocol=grpc" \
  OTEL_EXPORTER_OTLP_ENDPOINT=http://[NGINX SERVER NAME or IP]:14317 \
  OTEL_TRACES_EXPORTER=otlp \
  OTEL_METRICS_EXPORTER=otlp \
  OTEL_LOGS_EXPORTER=otlp

Run the Spring Boot application again

java -jar ./target/demo-0.0.1-SNAPSHOT.jar

After sending a few more requests using curl, you should see the corresponding trace, logging and metric data in Cloud Ops.

You can also use your favorite automation tool to install and configure all the elements. If you are interested, you can find an Ansible example here.

In this example, a gateway pattern is used, and a collector has been configured as the local agent on the application servers. After the agents process the telemetry data, the data are sent to a gateway which has backend collectors configured to communicate with Google Cloud Operation Suite.