View latency of app requests using Cloud Trace.

Narayan
9 min readSep 28, 2024

--

Understanding the performance of your application is crucial, and one key metric is latency: how long it takes your application to respond to user requests. Cloud Trace, Google Cloud’s powerful tracing service, offers a way to track these requests and visualize their performance, helping you pinpoint potential bottlenecks and optimize your app’s responsiveness.

This blog guides you through a practical hands-on example to understand how to collect and view latency data for your application using Cloud Trace. We’ll be building on a sample application running within a Google Kubernetes Engine (GKE) cluster. Let’s get started!

What is Cloud Trace?

Cloud Trace is a Google Cloud service that provides a powerful way to monitor your application’s performance and troubleshoot issues. It automatically generates traces that show the flow of requests through your application, including details about the time spent in each function or service call.

Let’s learn how to collect and view latency data from your applications

We will use a sample application deployed to a GKE cluster to demonstrate the power of Cloud Trace. This example uses a Python application based on the Flask framework, instrumented with OpenTelemetry to generate traces. Following these steps will demonstrate how to leverage Cloud Trace for latency analysis:

  1. Create a Google Kubernetes Engine (GKE) cluster: Open Cloud Shell and execute the following command to create a GKE cluster:
gcloud container clusters create cloud-trace-demo --zone us-central1-c

This command creates a standard GKE cluster named cloud-trace-demo in the us-central1-c zone. This process may take several minutes.

2. Configure kubectl: Configure kubectl to automatically refresh its credentials to use the same identity as the Google Cloud CLI:

gcloud container clusters get-credentials cloud-trace-demo --zone us-central1-c

Verify the connection with:

kubectl get nodes

This should list the nodes in your cluster.

3. Download and deploy a sample application: Download and deploy a Python application, which uses the Flask framework and the OpenTelemetry package.

Clone a Python app from GitHub:

git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git

Run the following command to deploy the sample application:

cd python-docs-samples/trace/cloud-trace-demo-app-opentelemetry && ./setup.sh

This script deploys three interconnected services (cloud-trace-demo-a, cloud-trace-demo-b, and cloud-trace-demo-c) and sets up a load balancer. It will take a few minutes to complete.

About the application

The application is a simple three-service system (a, b, c) where each service sends requests to the next. When you send an HTTP request to service “a” with the curl command, it initiates a chain of calls that ends with service “c”. Each service adds its own text to the response before passing it back, so the final output shows

Hello, I am service A
And I am service B
Hello, I am service C

The file app.py in the GitHub repository, contains the instrumentation necessary to capture and send trace data to your Google Cloud project:

The application imports several OpenTelemetry packages:

from opentelemetry import trace
from opentelemetry.exporter.cloud_trace import CloudTraceSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.propagate import set_global_textmap
from opentelemetry.propagators.cloud_trace_propagator import CloudTraceFormatPropagator
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

The setup automatically instruments web requests, capturing trace context and monitoring both Flask handlers and requests to external services:

app = flask.Flask(__name__)
FlaskInstrumentor().instrument_app(app)
RequestsInstrumentor().instrument()

To configure the Cloud Trace exporter for trace propagation in the Cloud Trace format, the following function is defined:

def configure_exporter(exporter):
"""Configures OpenTelemetry context propagation to use Cloud Trace context

Args:
exporter: exporter instance to be configured in the OpenTelemetry tracer provider
"""
set_global_textmap(CloudTraceFormatPropagator())
tracer_provider = TracerProvider()
tracer_provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(tracer_provider)


configure_exporter(CloudTraceSpanExporter())
tracer = trace.get_tracer(__name__)

Lastly, when sending requests in Python, OpenTelemetry automatically propagates the trace context with your outgoing calls:

if endpoint is not None and endpoint != "":
data = {"body": keyword}
response = requests.get(
endpoint,
params=data,
)
return keyword + "\n" + response.text
else:
return keyword, 200

This setup ensures that your application effectively tracks and sends trace data to Google Cloud, helping you monitor performance and troubleshoot issues with ease.

Key Concepts:

Trace: A trace is a collection of spans that describe a complete request across your application.
Span: A span represents a specific operation within your application.
Latency: This is the time it takes for an entire trace or a specific span to complete.

4. Create trace data: Once the application is deployed, generate trace data by sending an HTTP request:

curl $(kubectl get svc -o=jsonpath='{.items[?(@.metadata.name=="cloud-trace-demo-a")].status.loadBalancer.ingress[0].ip}')

This command retrieves the external IP address of the cloud-trace-demo-a service and sends a request to it. The response will display greetings from each service. You can execute this command multiple times to generate more traces.

The curl command works as follows:

  • kubectl fetches the IP address of the service named cloud-trace-demo-a.
  • The curl command then sends the HTTP request to service a.
  • Service a receives the HTTP request and sends a request to service b.
  • Service b receives the HTTP request and sends a request to service c.
  • Service c receives the HTTP request from service b and returns the string Hello, I am service C to service b.
  • Service b receives the response from service c, appends it to the string And I am service B, and returns the result to service a.
  • Service a receives the response from service b and appends it to the string Hello, I am service A.
  • The response from service a is printed in the Cloud Shell.

5. View latency data in Cloud Trace: Navigate to the Trace explorer in the Google Cloud Console.

You should see traces represented as dots on a graph and rows in a table. Each trace represents a single request to your application. More details on finding the trace can be found here

Select a trace to view its details. This will display a Gantt chart breaking down the trace into individual spans, showing the latency of each operation within the request. You can click on individual spans to see more detailed information.

Understanding the Scatter Plot for Request Monitoring

The scatter plot visually represents each request made within a selected time interval, with each dot on the plot corresponding to a specific request.

  • Coordinates: The (x, y) coordinates of each dot reflect the request’s time and its latency.
  • Error Indication: The color of the dot conveys the outcome of the request: blue signifies success, while red indicates failure. In the previous screenshot, you can see that most commands were successfully executed.

Additionally, hovering over a dot activates a tooltip that provides detailed information, including the date, time, URI, and latency of the request. This functionality allows for quick insights into request performance and troubleshooting.

Exploring Traces with the Scatter Plot

To explore a trace, click a dot in the scatter plot. When you click a dot in the scatter plot, the following changes occur to the Trace Explorer page:

  • Scatter Plot Update: The plot refreshes, highlighting your selected dot with a circle, while all other dots are dimmed for clarity.
  • Trace Details Pane: This section now displays important information:

Trace Identifier: A globally unique 128-bit integer represented as a 32-byte hexadecimal string.

Summary Line: This includes the start time, duration, and total number of spans associated with the trace.

Logs & Events Menu: This menu controls the visibility of logs and events related to the trace. By default, if logs or events exist, a circle is added to the trace span. Overlapping circles indicate multiple logs or events. To view each log or event as a separate row in a table, click the drop-down arrow next to “Logs & events” and select “Show expanded.”

Trace Table: The first row corresponds to the trace itself, with additional rows for each span within the trace. Each span is listed with its name and the associated service. The service name is extracted from the OpenTelemetry attribute service.name. If that attribute isn’t set, the App Engine service name will be displayed if applicable; otherwise, the service will remain unspecified.

Latency Column: This column visually represents latency, status, and any event annotations. A blue latency bar indicates successful completion, while a red latency bar signifies an error. Event annotations are shown as circles on the latency bar, providing a clear overview of performance metrics.

Viewing Span Details

Expanding the Trace Details Pane: To maximize your view of the Trace details pane, click on the Expand trace details icon in the toolbar. This action will enlarge the pane to fill the entire Trace Explorer page.

Accessing Detailed Information: To view detailed information about a trace or a specific span, simply click on the latency bar for the relevant entry in the Trace details pane. Once selected, the pane will refresh to display a tabbed table containing additional insights.

For the Trace ID Row: If you select the latency bar for the first row, labeled Trace ID, the table will present two tabs: Summary and Logs. The Summary tab provides essential information about the trace, including the HTTP command type, service details, and latency data for each span.

For Span Rows: If you click on the latency bar for any other span, the table will expand to include four tabs: Attributes, Logs & Events, Stacktraces, and Metadata & Links.

Exploring Attributes: To discover the labels associated with a span, navigate to the Attributes tab.

To locate a specific label or a group of labels, add a filter. For example, if you add the filter Key: g.co, then the table lists all labels where the label key contains g.co.

Accessing Logs & Events

To see information about related log entries and events, navigate to the Logs & Events tab, which will show relevant data when available. For more details on event annotations, check out the section on Annotating Trace Spans.

Exploring Stack Traces and Metadata

To gather information about the available stack traces and detailed insights about a specific captured stack trace, head to the Stacktraces tab.

Accessing Metadata & Links

For general details about a span and to view a table of links to related spans, check out the Metadata & Links tab. This section provides crucial information, including:

  • Span ID: A unique 64-bit integer that identifies the span (excluding 0). For more details, refer to the TraceSpan documentation.
  • Parent Span ID: The ID of the parent span, if applicable.
  • Project ID
  • Start and End Time: The timestamps for the span’s duration.
  • Links Table: This table lists connections between the current span and other spans. Each row in the Links table includes:
    Attributes: Key-value pairs for the linked span.
    Trace Field: Links to the trace associated with the linked span. If this field displays “Current trace,” it means the linked span is part of the same trace. Otherwise, it shows a trace ID.

6. Clean Up:

To avoid unnecessary charges, delete the GKE cluster:

gcloud container clusters delete cloud-trace-demo - zone us-central1-c

If you created a new project for this tutorial, consider deleting it as well.

Conclusion: By utilizing Cloud Trace, you gain a powerful tool to understand and troubleshoot latency in your applications. This process allows you to:

  • Identify performance bottlenecks in your code or services.
  • Improve application responsiveness by identifying slow areas for optimization.
  • Debug application behavior and isolate issues through the visual flow of requests.

Cloud Trace empowers you to build more efficient and performant applications that deliver a better user experience. So dive in and explore the insights it provides to unlock the true potential of your cloud applications!

--

--