API Bites — Distributed Tracing, OpenTelemetry & W3C Trace Context

TRGoodwill
API Central
Published in
4 min readOct 11, 2022

Distributed tracing

API management platforms provide centralized but narrow views of API health & usage. Every large and growing organization should have an API/integration logging strategy incorporating Security and Incident Event Management (SIEM), and distributed tracing.

Distributed tracing can provide:

  • Traceable end-to-end visibility of the service context
  • Self-service to detailed incident data, ensuring decreased mean-time-to-resolution and reduced ticket misdirection.
  • Insights into trends, data flows and dependencies.
  • The ability for internal API consumers to troubleshoot without requiring verbose and potentially unsecure error response messages, allowing secure error response policies to be enforced by API gateways.

OpenTelemetry (OTEL)

Distributed tracing can provide detailed incident data and valuable insights into interconnected systems, however successful implementation requires an enterprise-wide commitment and compliance.

Native cloud tracing is easily employed but difficult to stitch together across platforms. A non-proprietary, cross-platform standard such as the Open Telemetry protocol and W3C Trace Context specification is better suited for hybrid and multi-cloud environments.

When OpenTelemetry and WC3 Trace Context is employed all systems implementing or mediating APIs must either participate in a trace by updating the trace header or otherwise propagate the trace header to guarantee that traces are not broken.

Transaction and Audit Logging

A standard format for enterprise logs in a JSON/REST environment is a JSON structure of key-value pairs based on OpenTelemetry (or OpenTracing) Semantic conventions. Telemetry specifications and libraries for variety of languages are available from the OpenTelemetry and OpenTracing GitHub repositories, and from platform vendors.

An example OpenTelemetry payload might look like this:

{
"trace_id": "7bba9f33312b3dbb8b2c2c62bb7abe2d",
"parent_id": "",
"span_id": "086e83747d0e381e",
"name": "/v1/sys/health",
"start_time": "2021-10-22 16:04:01.209458162 +0000 UTC",
"end_time": "2021-10-22 16:04:01.209514132 +0000 UTC",
"status_code": "200",
"status_message": "OK",
"attributes": {
"net.transport": "IP.TCP",
"net.peer.ip": "172.17.0.1",
"net.host.ip": "10.177.2.152",
"http.host": "10.177.2.152:26040",
"http.scheme": "https",
"http.method": "GET",
"http.target": "/sys/v1/health",
"http.server_name": "api-gateway",
"http.user_agent": "System Health Check",
"http.header.requestid": "987654",
"http.header.correlationid": "234567",
"oauth2.subject": "123456",
"oauth2.client_id": "567890"
},
"events": [
{
"name": "outbound.response",
"message": "200 OK",
"timestamp": "2021-10-22 16:04:01.209512872 +0000 UTC"
}
]
}

Logged API request and response metadata include HTTP trace-headers that allow analytics platforms to stitch together a consolidated view of distributed transactions, providing detailed incident data and valuable insights into interconnected systems.

W3C Trace Context

Ideally, all API providers should participate in a distributed trace. The following is an overview of the W3C Trace Context specification and trace participation requirements.

The presence of a traceparent header indicates an active trace. The Trace Context ‘traceparent’ and ‘tracestate’ headers are optional headers to enable tracing tools to follow, analyze and debug a uniquely identifiable transaction across distributed systems and multiple trace tools.

Trace context is split into two individual propagation fields. The traceparent header uniquely identifies the request in a tracing system, and the tracestate header extends traceparent with platform-specific span or trace IDs represented by a set of name/value pairs.

Tracing tools can provide two levels of compliant behavior interacting with trace context:

  • Participating in a trace. Systems and mediating platforms should, wherever possible, participate in a trace by modifying the traceparent header and relevant parts of the tracestate header containing their proprietary information.
  • Forwarding a trace. At a minimum the traceparent and tracestate headers must be propagated to guarantee traces are not broken. Where enterprise logging is performed, trace headers should also be logged.

The traceparent header

Identifies the request in a tracing system, describing the position of the incoming request in its trace graph in a portable, fixed-length format.

format = version “-” trace-id “-” parent-id “-” trace-flags

e.g. “00–0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1–01”

Where:

— version is the Trace Context specification version

= 2HEXDIGLC ; this document assumes version 00.

— trace-id is a globally unique ID of the whole trace forest

= 32HEXDIGLC ; 16 bytes array identifier. All zeroes forbidden

— parent-id is the ‘span-ID’ of this call/trace as known by the caller

= 16HEXDIGLC ; 8 bytes array identifier. All zeroes forbidden

— trace-flags, an 8-bit field that controls flags such as trace sampling, level

= 2HEXDIGLC ; 8 bit flags. Currently only one bit is supported...   [01 “Recorded”]

The tracestate header

The value of a concatenation of trace graph key-value pairs. It conveys information about request position in multiple distributed tracing graphs. This header is a companion header for the traceparent header.

Multiple tracestate headers are allowed. Values from multiple headers in incoming requests should be combined in a single comma-separated header according to Field Order [RFC7230] and sent as a single header in outgoing request. New or modified keys are added to the left-most position, letting the next server know which tracing system corresponds with traceparent parent-id.

The tracestate field value is a comma-separated list of key/value pairs separated by an equals sign (‘=’). Spaces and horizontal tabs surrounding list-members are ignored. There can be a maximum of 32 list-members in a list.

format = list-member 0*31(“,” list-member)

— list-member = key “=” value

— — key — the unique ID of the trace participant

— — value — the span-ID or Trace-ID

e.g. vendrname1=b9c7c989f97918e1,vendrname2=b7ad6b7169203331

In a default, open implementation of the specification, list members consist of a unique system identifier followed by a value corresponding to the current or a previously logged parent-id.

e.g. 
traceparent: 00–0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1–01
tracestate: vendrname1=b9c7c989f97918e1,vendrname2=b7ad6b7169203331

Tracestate values may, however be vendor specific, opaque identifiers such as a trace-id. Only one entry per key is allowed as the entry represents the last position in the trace.

Platforms and libraries should send header name in lower case and must expect header names in any case.

The Trace Context specification can be found at http://www.w3.org/TR/trace-context/. . The trace-context GitHub page https://github.com/w3c/trace-context contains further detail, walkthrough’s and implementation examples.

OpenTelemetry framework resources can be found at https://opentelemetry.io/.

Related Articles

API Bites — Tactics to Secure Sensitive APIs

Writing API Design Standards. An 8-step guide to tailoring resource API Design Standards to Your Organizational Context

--

--

TRGoodwill
API Central

Tim has several years experience in the delivery and evolution of interoperability frameworks and platforms, and currently works out of Berlin for Accenture ASG