Distributed Tracing for Ruby on Rails Microservices with OpenCensus/OpenTelemetry (part 2)

Published in

Wantedly Engineering

8 min readMay 17, 2019

This series of articles is based on a talk I gave at RailsConf 2019, titled “Troubleshoot Your RoR Microservices with Distributed Tracing.” In part 2 of this series, I will introduce OpenCensus, which provides not only distributed tracing, but also other features to improve observability. I will show how to use OpenCensus in your Rails applications. Part 1 is here for introduction to distributed tracing.

Introducing OpenCensus

As I’ve explained in part 1, there are many options for distributed tracing backends; SaaS solutions and open source solutions. Paid SaaS solutions includes StackDriver, DataDog, AWS X-Ray, etc. Or, if you have the capacity to run an open source solution on your own, you can also use Zipkin or Jaeger. Supported programming languages will also vary depending on your tracing backend.

Here is where OpenCensus fits in. It is a set of vendor-neutral libraries to collect and export traces and metrics. By “vendor-neutral”, it means that you can send traces to a backend of your choice. It also provides support for major programming languages. Java, I believe, is the most mature implementation. Go, Node, and Python are also mature and provide full features of OpenCensus specs. C++, C#, PHP, Erlang/Elixir and Ruby are a bit behind , but at least support tracing.

OpenCensus was originally created by Google, based on their internal library called Census, but it is now developed by a broad community of service developers and cloud vendors.

I won’t touch the metrics part of OpenCensus in this article, but it certainly does more than distributed tracing to improve observability of your system.

Data model

In OpenCensus, data model is defined using protocol buffers, and you can take a look at the respective proto file on the GitHub repo to understand the exact definition.

Here are the descriptions of some of the important fields of a span:

trace_id: a 16-byte unique identifier for a trace.
span_id: a 8-byte unique identifier for a span.
parent_span_id: span_id of this span’s parent span.
name: a description. Can be a method name, or a file name and a line num.
kind: UNSPECIFIED, SERVER, or CLIENT.
start_time, end_time: when a span starts and ends.
attributes: a set of key-value pairs. Value can be string, integer, double or bool.
stack_trace: a stack trace at the start.
time_events: a time-stamped annotation or send/rcv message event in a span.
links: a pointer from the current to another span in the same or a different trace.
status: a final status of the span

See https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/trace/v1/trace.proto for the full proto file. See also https://github.com/census-instrumentation/opencensus-specs for the specifications.

Architecture

This roughly illustrates the architecture of OpenCensus. There are three services, which make HTTP API requests cascadingly, and two tracing backends. In each service, there are usually a collector and an exporter module provided as part of OpenCensus libraries and plugged into the main application code.

When a service makes an HTTP request, context of a current trace such as trace_id and span_id is propagated from a caller to a callee service. storing contextual information on the HTTP headers. Note that it usually creates a span both on the caller and callee sides for a HTTP request; the kind field of a span is set to CLIENT on caller side and SERVER on callee side.

Within an application process of a service, contextual information is usually propagated using a thread local variable and creates a new span from the context.

Ruby/Rails integration

OpenCensus provides out-of-box integrations for many web application frameworks, so it does for Ruby (on Rails).

OpenCensus Ruby comes with a rack middleware to handle an inbound HTTP request to your service. The rack middleware extracts current trace context from HTTP header, and creates a top level span for a given request.

For an outbound HTTP request, it provides Faraday middleware, it sets HTTP headers to propagate trace context and creates a caller-side span.

By default, it also captures common events like database queries of ActiveRecord and view rendering of ActionView.

Those spans created during a request are queued within an exporter module. After a HTTP response is generated, the exporter module sends the spans to distributed tracing backend(s). Note that there is no collector module in Ruby implementation.

Configurations

It is rather straightforward to use OpenCensus in ruby. First of all, you need the opencensus gem in your Gemfile:

# Gemfile
gem 'opencensus'

Then configure when your process starts:

# When a process starts
OpenCensus.configure do |c|
  c.trace.middleware_placement = :begin
  c.trace.exporter = exporter
  c.trace.default_sampler = \
    OpenCensus::Trace::Samplers::Probability.new(0.01)
  c.trace.default_max_attributes = 64
end

You can specify the placement of the opencensus rack middleware inside the rack middleware stack. It can be at the beginning, at the end, or right after a specified middleware. You can also specify what exporters to use, which I will explain in the next section. A sampler determines if a given trace should be reported. You can always report in staging environment, or you can sample with probability in high traffic production environment, as in this example. OpenCensus limits data size of an entire trace, so tracing itself does not have impact on the memory footprint or network bandwidth of the application process itself. For instance, a span has a limit in number of attributes, and a trace has limit in number of spans. Those limits are something you can configure.

Exporter

OpenCensus has a concept of exporter, which you can configure to specify which backends you want traces or metrics to be sent to. You can even export to multiple backends at once. This is handy because you can play around with a new backend while using an existing backend, or you can let a different team use different tools.

Here are some examples of exporter configurations:

# DataDog exporter
uri = URI.parse(ENV['DATADOG_APM_AGENT_URL'])
c.exporter = OpenCensus::Trace::Exporters::Datadog.new \
  service: app_name,
  agent_hostname: uri.host,
  agent_port: uri.port# StackDriver exporter
keyfile = Base64.strict_decode64(ENV['STACKDRIVER_JSON_KEY_BASE64'])
c.exporter = OpenCensus::Trace::Exporters::Stackdriver.new \
  project_id: gcp_project_id,
  credentials: JSON.parse(keyfile)

An exporter class for a specific backend is provided as a separate gem.

There is an exporter class for each backend. So you make an instance of it and set it in a config object. You can even write your own exporter if necessary; the Datadog exporter above has been written by one of our members.

Furthermore, if you want to use multiple exporters, you can simply wrap it in the multi exporter.

# multiple exporters
exporters = []
exporters << OpenCensus::Trace::Exporters::Datadog.new(...)
exporters << OpenCensus::Trace::Exporters::Stackdriver.new
c.exporter = OpenCensus::Trace::Exporters::Multi.new(*exporters)

Railtie

For Ruby on Rails integration specifically, OpenCensus Ruby provides Railtie. So you just need to require the railtie in config/application.rb, and top-level configuration object is exposed as in Rails config as below:

# application.rb 
require 'opencensus/trace/integrations/rails'  # <- Rails::Railtiemodule MyApp
  class Application < Rails::Application    # the top-level config object is exposed as `config.opencensus` 
    config.opencensus.trace.default_max_attributes = 64    # ...  end
end

Rack Middleware

Let’s look at what the rack middeleware does.

class OpenCensus::Trace::Integrations::RackMiddleware
  def call env
    formatter = Formatters::TraceContext.new
    context = formatter.deserialize env[formatter.rack_header_name]Trace.start_request_trace \
      trace_context: context,
      same_process_as_parent: false do |span_context|
      begin
        Trace.in_span get_path(env) do |span|
          start_request span, env
          @app.call(env).tap do |response|
            finish_request span, response
          end
        end
      ensure
        @exporter.export span_context.build_contained_spans
      end
    end
  end
end

This is pseudo code because it is simplified and not complete, however you should be able to get the sense of it. Every rack middleware has a #call method that takes env hash and returns an array of three elements, which are the response code, the header, and the body. And it executes @app.call where @appis either a rack app or another rack middleware, so it can inject anything before and after the actual HTTP response is made for a request. In this case, it extracts the trace context from the HTTP header in the first two lines, then creates a span with data from HTTP request like host, path, method, user agent, and then when the response is made, it taps and populates more data into the span from that response. Finally, it exports all the spans created in a request.

Rails Events

DEFAULT_NOTIFICATION_EVENTS = [
  "sql.active_record",
  "render_template.action_view",
  "send_file.action_controller",
  "send_data.action_controller",
  "deliver.action_mailer"
].freezedef setup_notifications
  OpenCensus::Trace.configure.notifications.events.each do |type|
    ActiveSupport::Notifications.subscribe(type) do |*args|
      event = ActiveSupport::Notifications::Event.new(*args)
      handle_notification_event event
    end
  end
end

As I mentioned earlier, OpenCensus Ruby by default instruments some common events in Rails app such as SQL execution, view template rendering, sending file, email delivery, etc. This is done using Rails API called ActiveSupport::Notifications and configured within the railtie as above. With ActiveSupport::Notifications , you can subscribe to certain events and set handler for events. Here, handle_notification_event method gets called after an event happens.

def handle_notification_event event
  span_context = OpenCensus::Trace.span_context
  if span_context
    ns = OpenCensus::Trace.configure.notifications.attribute_namespace
    span = span_context.start_span event.name, skip_frames: 2
    span.start_time = event.time
    span.end_time = event.end
    event.payload.each do |k, v|
      span.put_attribute "#{ns}#{k}", v.to_s
    end
  end
end

The handler method creates a span from a current span context and sets attributes from a given notification event.

Farraday Middleware

class OpenCensus::Trace::IntegrationsFaradayMiddleware < ::Faraday::Middleware
  def call request_env
    span_context = request_env[:span_context]
    span_name = extract_span_name(request_env)span = span_context.start_span span_name, sampler: @sampler
    start_request span, request_env
    begin
      @app.call(request_env).on_complete do |response_env|
        finish_request span, response_env
      end
    rescue StandardError => e
      span.set_status 2, e.message
      raise
    ensure
      span_context.end_span span
    end
  end
end

This is what faraday middleware essentially does. Very similar to rack middleware, but it is for an outbound request, it basically wraps a HTTP request.

conn = Faraday.new(url: api_base_url) do |c|
  c.use OpenCensus::Trace::Integrations::FaradayMiddleware,
    span_name: ->(env) { env[:url].path }
  c.adapter Faraday.default_adapter
end

Using Faraday middleware is easy. When you initialize a Faraday instance, you can just let it use the middleware as above.

I highly recommend you to do this in a library, or a private gem if you run multiple rails apps, so that you don’t need to repeat this in every rails app you have.

If you use another HTTP client other than Faraday for outbound requests, you can still make distributed tracing work. Essentially you need to do two things; 1) instrument your HTTP request using custom span API, and 2) propagate a trace context on HTTP request header.

OpenCensus::Trace.in_span "long task" do
  t = rand * 10
  sleep t
enddef in_span name, kind: nil, skip_frames: 0, sampler: nil
  span = start_span name, kind: kind, skip_frames: skip_frames + 1,
                          sampler: sampler
  begin
    yield span
  ensure
    end_span span
  end
end

Here is an example of the custom span API. You can pass in a block to instrument OpenCensus::Trace.in_span method. Alternatively you can directly use #start_span method, but make sure you call #end_span method.

Summary

To recap, now you know what is distributed tracing, how it can help you solve problems in microservices architecture, and give you insights of your complex system.

You can really easily adopt distributed tracing in Rails using OpenCensus. You can start today.

OpenCensus will soon be merged into OpenTelemetry. The new project officially launches on May 20th, 2019 as per the roadmap on KubeCon + CloudNativeCon Europe 2019. However I don’t see any reason to wait for OpenTelemetry to be implemented as there will be backward compatibility provided.

That’s it! I hope this helps and you love distributed tracing now. Please let me know your feedback.

Here is a slide deck of my talk in case you want to take a look at it!

Special thanks to @munisystem for his major contributions to the distributed tracing project at Wantedly.