Peek Inside Coherence With OpenTracing

Ryan Lubke
Oracle Coherence
Published in
7 min readNov 17, 2020

OpenTracing is a set of APIs to enable instrumentation and/or tracing in a distributed environment. Given the distributed nature of Coherence, OpenTracing is both a natural fit and a means to provide insight into the breakdown of a distributed request pipeline. Beginning with 14.1.1.0 release, Oracle Coherence provides an integration with the OpenTracing API.

PREREQUISITES

In order to use the Coherence integration with OpenTracing, the following dependencies (in maven coordinate format) need to be satisfied:

  • io.opentracing:opentracing-api:0.3{2,3}.0
  • io.opentracing:opentracing-util: 0.3{2,3}.0
  • io.opentracing:opentracing-noop: 0.3{2,3}.0
  • io.opentracing.contrib:opentracing-tracerresolver:0.18

The API artifact requirement is straight forward; however, the tracerresolver dependency does warrant some discussion. The tracerresolver library allows the discovery of OpenTracing-compliant implementations on the class path using Java’s ServiceLoader facility. This decouples Coherence from the instantiation and configuration of a particular runtime and allows the developer to use any compliant runtime they wish. At the moment, Jaeger is the only OpenTracing-compliant runtime that includes a TracerResolver implementation; however, the TracerResolver API is simple enough for a developer to be able to write an implementation to bootstrap the tracing implementation of choice for use with Coherence.

That said, if your shop is using or plans to use Jaeger, then you can save some time and include io.jaegertracing: jaeger-client:1.1.0 [1] as your project dependency as its transitive dependencies already include the required OpenTracing API artifacts mentioned above.

[1] Version 1.1.0 is the latest version of Jaeger we’ve tested with. Any version should be acceptable as long as it supports the minimum required OpenTracing API versions.

WHAT IS TRACED

The following operations are traced:

  • NamedCache operations against partitioned caches
  • Events generated for MapListener and/or EventInterceptor implementations
  • CacheStore operations
  • Persistence operations

All tracing spans generated by Coherence will include common metadata, such as:

  • The member ID
  • The originating member ID, if the operation was dispatched to another member for processing
  • The thread name
  • The component name, such as transport or the service name

When reviewing the spans generated by Coherence, you may notice that some operations may appear with different suffixes (e.g., Invoke.request, Invoke.dispatch, and/or Invoke.process). These suffixes represent the different stages an operation goes through in order to be serviced.

  • request — a NamedCache operation has been submitted to other member(s) of the cluster.
  • dispatch — the request has been received by a recipient and has been added to a thread pool (if present) for execution. This suffix will be omitted if there is no thread pool.
  • process — the receiver of the request executed the NamedCache operation.

CONFIGURATION

Like most features in Coherence, there are multiple ways to configure the OpenTracing integration. All methods resolve around configuring the tracing sampling ratio. The sampling ratio can be expressed as one of three valid configuration states:

  • -1 — This value will disable tracing (this is the default).
  • 0 — This value means Coherence will not initiate tracing spans without an already active span, so developers must start a tracing span prior to invoking the Coherence operation of interest.
  • 0.011.0 — This range is the percentage of spans that will be captured. For instance, a value of 0.1 will cause 10% of the tracing spans to be sampled, while a value of 1.0 will result in all spans being collected.

COHERENCE OPERATIONAL OVERRIDE

We’ve added a new element called tracing-config (please see the schema for details on element ordering) that allows the configuration of the tracing sampling ratio:

SYSTEM PROPERTY

If using an override file isn’t desirable, it is possible to configure the OpenTracing integration using a system property named coherence.tracing.ratio. For example:

JMX

The tracing sampling ratio can be configured at the cluster or member level with JMX.

To change the sampling ratio at a cluster level, use the configureTracing(String, Float) operation exposed on the ClusterMBean. The String argument specifies the role of the members to be updated. If null or a zero-length string, then the configuration change will be applied to all cluster members. The float argument represents the desired tracing ratio.

To view or change the tracing ratio on a per-member basis, see the mutable property, TracingSamplingRatio, on the ClusterNodeMBean.

MANAGEMENT OVER REST

If a REST style of cluster management is required, it is also possible to configure the tracing ratio using Management over REST.

To change the sampling ratio at a cluster level, issue a POST to /management/coherence/cluster/configureTracing resource. This resource is functionality equivalent to the configureTracing(String,Float) operation on the ClusterMBean. The body should be a JSON object with the keys role and tracingRatio. Here’s an example using curl:

where, because role is null (or a zero-length string), the configuration change will be applied to all members. If, for example, the following was sent:

then the tracing configuration will be applied to only those members with the matching role of storage.

In order to make per-member changes, use the /management/coherence/cluster/members/{memberIdentifier} resource where {memberIdentifier} is the member ID. How this resource behaves depends on the HTTP verb. For example, sending a GET, like:

will result in all of the readable properties and their values being returned for that specific member.

Sending a POST this resource indicates the intent to update the property with a new value. The body should be a JSON object with the key being the name of the property and its value representing the updated configuration value. Here’s an example updating the sampling ratio for member 1:

DEMO

Let’s prove that the feature works. We’ll do this by starting a Jaeger instance to capture and visualize the spans generated by Coherence. We’ll then start two storage-enabled cache servers and one console-based member (to interact with the cache); all with tracing enabled via the command line. This is the minimum effort required to demonstrate this feature.

JAEGER

We’re going to use a Docker container to run the Jaeger instance, fortunately, the Jaeger folks have made this easy, we can simply run:

we can verify Jaeger is working by accessing http://localhost:16686 with a browser:

COHERENCE

With Jaeger available, let’s stand up a Coherence cluster containing two storage-enabled members and one member as the console.

Here is the command used to start the storage enabled members:

The command for the console member isn’t much different outside of the omission of coherence.distributed.localstorage and a different main class:

In all cases, note that the tracing ratio is configured using the coherence.tracing.ratio system property with a value of 1; meaning all tracing spans will be captured and that we’ve defined the JAEGER_SERVICE_NAME and pointed JAEGER_ENDPOINT to the default tracing endpoint when using the Jaeger all-in-one development container.

If we start the cluster/console now using these commands, once stable, it should be possible to see Coherence in the list of Services enumerated by the Jaeger UI (you may need to refresh the browser in order for the change to be visible):

From the console we previously started, let’s see if we can generate some spans:

Map (?): cache test

Map (test): put 1 1
null
Map (test): get 1
1

Switch back to the Jaeger UI, and let’s look at what was captured:

Here’s our PUT! The operation itself generated four additional spans. Let’s take a closer look:

A BETTER DEMO

While it’s possible to start a local Jaeger instance followed by starting a Coherence cluster and showing what tracing spans are generated, that doesn’t make for a very interesting demo. So we decided to take the existing Coherence Demo application, add database support, and then ensure all JAX-RS and database operations are traced, in addition to Coherence operations.

DESIRED DEMO ARCHITECTURE

The following represents a high-level view of the changes we plan to make to the Coherence Demo application in order to demonstrate how Coherence OpenTracing fits into a larger, more complex application. We’ll demonstrate how we were able to generate tracing spans for both the JAX-RS and database layers of the application. Lastly, we’ll show-off the dynamic configuration, of both cluster-wide and per-member tracing by using Coherence Management over REST.

As there are many changes being made to demonstrate this feature as thoroughly as possible, we decided it will be best to break this discussion up over a series of blogs starting with the JAX-RS modifications. From there, we’ll follow up roughly weekly with additional entries until we have the demo in the desired state.

So, check back next week as we start on a short, but interesting journey to “tracify” the coherence-demo application, which we hope, after reviewing, will help you add tracing to your own Coherence applications, to take advantage of the insights this new feature can offer.

--

--