Using OpenTelemetry to Trace and Monitor a Dice Rolling Application

Sean Zheng
6 min readJun 14, 2024

--

Introduction

Inspired by OpenTelemetry sample code, we will implement a simple dice rolling application and observe the traces and metrics collected using OpenTelemetry.

Setup OpenTelemetry SDK

First, we need to set up the global OpenTelemetry (OTel) SDK. In this example, we use gRPC communication, with the default port set to 4317.

Import Packages

package otelx

import (
"context"
"errors"
"fmt"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/propagation"
sdkmetric "go.opentelemetry.io/otel/sdk/metric"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)

Shutdown Function

The Shutdown variable is a global function used to shut down the OpenTelemetry SDK. Initially, it performs no operation.

var Shutdown = func(context.Context) error {
return nil
}

SetupOTelSDK Function

The SetupOTelSDK function sets up the OpenTelemetry SDK using Jaeger for tracing and OTLP for metrics collection.

func SetupOTelSDK(ctx context.Context, target string, name string) (err error) {
var shutdownFuncs []func(context.Context) error
Shutdown = func(ctx context.Context) error {
for _, fn := range shutdownFuncs {
err = errors.Join(err, fn(ctx))
}
shutdownFuncs = nil
return err
}
res, err := resource.New(ctx, resource.WithAttributes(semconv.ServiceNameKey.String(name)))
if err != nil {
return fmt.Errorf("failed to create resource: %w", err)
}
conn, err := initConn(target)
if err != nil {
return err
}
tracerProvider, err := newTracer(ctx, res, conn)
if err != nil {
return err
}
shutdownFuncs = append(shutdownFuncs, tracerProvider.Shutdown)
meterProvider, err := newMeter(ctx, res, conn)
if err != nil {
return err
}
shutdownFuncs = append(shutdownFuncs, meterProvider.Shutdown)
return nil
}

Initialize gRPC Connection

The initConn function initializes a gRPC connection.

func initConn(target string) (*grpc.ClientConn, error) {
conn, err := grpc.Dial(target, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
return nil, fmt.Errorf("failed to create gRPC client: %w", err)
}
return conn, nil
}

Create Tracer

The newTracer function creates and configures the tracer provider.

func newTracer(
ctx context.Context,
res *resource.Resource,
conn *grpc.ClientConn,
) (*sdktrace.TracerProvider, error) {
exporter, err := otlptracegrpc.New(ctx, otlptracegrpc.WithGRPCConn(conn))
if err != nil {
return nil, fmt.Errorf("failed to create the Jaeger exporter: %w", err)
}
processor := sdktrace.NewBatchSpanProcessor(exporter)
provider := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sdktrace.AlwaysSample()),
sdktrace.WithResource(res),
sdktrace.WithSpanProcessor(processor),
)
otel.SetTracerProvider(provider)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
))
return provider, nil
}

Create Meter

The newMeter function creates and configures the meter provider.

func newMeter(
ctx context.Context,
res *resource.Resource,
conn *grpc.ClientConn,
) (p *sdkmetric.MeterProvider, err error) {
exporter, err := otlpmetricgrpc.New(ctx, otlpmetricgrpc.WithGRPCConn(conn))
if err != nil {
return nil, fmt.Errorf("failed to create the OTLP exporter: %w", err)
}
provider := sdkmetric.NewMeterProvider(
sdkmetric.WithReader(sdkmetric.NewPeriodicReader(exporter, sdkmetric.WithInterval(3*time.Second))),
sdkmetric.WithResource(res),
)
otel.SetMeterProvider(provider)
return provider, nil
}

Summary

This code sets up and uses the OpenTelemetry SDK to collect tracing and metrics data in a Go application. Key steps include initializing resources, setting up tracer and meter providers, and configuring the gRPC connection.

Implement Dice Rolling

This code demonstrates how to use the Gin framework and OpenTelemetry to trace and monitor a simple dice rolling operation.

Import Packages

import (
"context"
"crypto/rand"
"log"
"math/big"

"github.com/blackhorseya/golang-101/pkg/otelx"
"github.com/gin-gonic/gin"
"go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/metric"
"go.opentelemetry.io/otel/trace"
)

Global Variables

const name = "rolldice"
const otelTarget = "localhost:4317"

var tracer trace.Tracer
var meter metric.Meter
var rollCounter metric.Int64Counter

Main Function

func main() {
// Initialize OpenTelemetry
err := otelx.SetupOTelSDK(context.Background(), otelTarget, name)
if (err != nil) {
log.Printf("Failed to initialize OpenTelemetry: %v", err)
return
}
defer func() {
err = otelx.Shutdown(context.Background())
if (err != nil) {
log.Printf("Failed to shutdown OpenTelemetry: %v", err)
}
}()

// Create a tracer and a meter
tracer = otel.Tracer(name)
meter = otel.Meter(name)
rollCounter, err = meter.Int64Counter(
"dice.rolls",
metric.WithDescription("The number of dice rolls"),
metric.WithUnit("{roll}"),
)
if (err != nil) {
log.Printf("Failed to create the counter: %v", err)
return
}
// Create a Gin router
router := gin.Default()
router.Use(otelgin.Middleware(name))
router.GET("/rolldice", rolldice)
// Start the server
err = router.Run(":8080")
if (err != nil) {
log.Printf("Failed to start the server: %v", err)
return
}
}

Roll Dice Handler

func rolldice(c *gin.Context) {
// Start a span
ctx, span := tracer.Start(c.Request.Context(), "roll")
defer span.End()

// Define the number of sides on the die
sides := 6
// Generate a random number in the range [0, sides)
n, err := rand.Int(rand.Reader, big.NewInt(int64(sides)))
if (err != nil) {
c.JSON(500, gin.H{"error": "failed to generate random number"})
return
}
// Add 1 to the result to get a number in the range [1, sides]
roll := n.Int64() + 1
rollValueAttr := attribute.Int("roll.value", int(roll))
span.SetAttributes(rollValueAttr)
rollCounter.Add(ctx, 1, metric.WithAttributes(rollValueAttr))
c.JSON(200, gin.H{"roll": roll})
}

Summary

This code demonstrates how to integrate OpenTelemetry with the Gin framework to trace and monitor a simple dice rolling service. Key steps include initializing OpenTelemetry, setting up a tracer and meter, and using OpenTelemetry middleware in Gin routes.

Viewing Traces in Jaeger

This screenshot shows the results of viewing OpenTelemetry trace data in the Jaeger UI. It illustrates the tracing of the rolldice operation, displaying information for two spans:

  1. /rolldice (Main Span)
  • Tags: HTTP request information, including method (GET), path (/rolldice), status code (200), etc.
  • Process: Information about the OpenTelemetry library.
  • Duration: 57 microseconds.
  • Represents the overall handling of the /rolldice HTTP request.
  1. roll (Sub Span)
  • Tags: Random number generated (roll.value = 5) and span format (internal.span.format = otlp).
  • Process: Information about the rolldice service.
  • Duration:
  1. 36 microseconds.
  • Represents the specific “roll” operation, i.e., generating the random number.

Trace Details

  • Trace Start: June 14, 2024, 12:08:40.266.
  • Duration: 57 microseconds.
  • Services: One service (rolldice).
  • Depth: 2, indicating two nested spans.
  • Total Spans: Two spans.

Key Information

  • The main span (/rolldice) shows the overall HTTP request handling time and related tags.
  • The sub-span (roll) shows the time taken for the specific dice roll operation and related tags.
  • This information is useful for understanding application performance and diagnosing issues, helping developers optimize code or solve potential problems.

Viewing Metrics in Prometheus

This screenshot shows the query results for the dice_rolls_total metric in the Prometheus UI. The query reveals the total number of dice rolls.

Query Results

  • exported_job: “rolldice”
  • Indicates the data comes from the “rolldice” export job.
  • instance: “otel-collector:9090”
  • The Prometheus instance address is “otel-collector:9090”.
  • job: “otel-collector”
  • The task name is “otel-collector”.
  • roll_value: Various values (e.g., “2”, “5”, “1”, “4”)
  • Indicates the specific value rolled each time.

Detailed Entries

  • Each line represents a recorded sample, containing common labels and a specific roll_value.
  • The final value (e.g., 2, 1) indicates the count of each corresponding dice value.

Information Interpretation

  1. dice_rolls_total{exported_job=”rolldice”, instance=”otel-collector:9090", job=”otel-collector”, roll_value=”2"} 2
  • The dice roll value of 2 occurred twice.
  1. dice_rolls_total{exported_job=”rolldice”, instance=”otel-collector:9090", job=”otel-collector”, roll_value=”5"} 2
  • The dice roll value of 5 occurred twice.
  1. dice_rolls_total{exported_job=”rolldice”, instance=”otel-collector:9090", job=”otel-collector”, roll_value=”1"} 1
  • The dice roll value of 1 occurred once.
  1. dice_rolls_total{exported_job=”rolldice”, instance=”otel-collector:9090", job=”otel-collector”, roll_value=”4"} 1
  • The dice roll value of 4 occurred once.

Summary

This screenshot demonstrates the query results for the dice_rolls_total metric in Prometheus, indicating the frequency of different dice values. Each entry includes the export job, instance, task name, specific dice value, and its count. This information is valuable for monitoring the distribution and frequency of dice rolls in the application.

By following the steps and explanations provided, you can effectively set up, trace, and monitor a simple dice rolling application using OpenTelemetry, Jaeger, and Prometheus. These tools are instrumental in gaining insights into application performance and identifying areas for optimization and improvement.

--

--