Take OpenTracing for a HotROD ride

Yuri Shkuro
May 6, 2017 · 20 min read
Image for post
Image for post
Photo by Trekphiler

Prerequisites

Agenda

Start Jaeger

docker run -d -p6831:6831/udp -p16686:16686 jaegertracing/all-in-one:latest
Image for post
Image for post
Jaeger UI front page, the Search screen.

Meet the HotROD

mkdir -p $GOPATH/src/github.com/jaegertracing
cd $GOPATH/src/github.com/jaegertracing
git clone git@github.com:jaegertracing/jaeger.git jaeger
cd jaeger
make install
cd examples/hotrod
go run ./main.go all
2017–05–03T23:53:19.693–0400 INFO cmd/all.go:31 Starting all services
2017–05–03T23:53:19.696–0400 INFO log/logger.go:43 Starting {“service”: “route”, “address”: “http://127.0.0.1:8083"}
2017–05–03T23:53:19.696–0400 INFO log/logger.go:43 Starting {“service”: “frontend”, “address”: “http://127.0.0.1:8080"}
2017–05–03T23:53:19.697–0400 INFO log/logger.go:43 Starting {“service”: “customer”, “address”: “http://127.0.0.1:8081"}
2017–05–03T23:53:19.697–0400 INFO log/logger.go:43 TChannel listening {“service”: “driver”, “hostPort”: “127.0.0.1:8082”}
Image for post
Image for post
HotROD single-page web app
Image for post
Image for post
We ordered a car that will arrive in 2 minutes.

The Architecture

Image for post
Image for post
Empirically constructed service dependency diagram

The Data Flow

Image for post
Image for post
The first trace
Image for post
Image for post
Trace timeline view
Image for post
Image for post
Expanded span details

Contextualized logging

Image for post
Image for post
Typical logging output
Image for post
Image for post
Logs recorded by the tracing system in the root span.

Span Tags vs. Logs

Image for post
Image for post
Tags and Logs

Identifying sources of latency

Image for post
Image for post
Executing many requests simultaneously shows increasing latency.
Image for post
Image for post
Multiple traces returned in the search results.
Image for post
Image for post
Driver ID recorded in the log field.
Image for post
Image for post
Searching for trace by tags
Image for post
Image for post
High latency trace.
Image for post
Image for post
Inspecting very long mysql span.
Image for post
Image for post
Customer 392 is the only info passed by the caller.
// simulate misconfigured connection pool that only gives
// one connection at a time
d.lock.Lock(ctx)
defer d.lock.Unlock()
// simulate db query delay
delay.Sleep(config.MySQLGetDelay, config.MySQLGetDelayStdDev)
Image for post
Image for post
Improved latency after fixing mysql bottleneck.
Image for post
Image for post
Trace after removing mysql bottleneck
// getRoutes calls Route service for each (customer, driver) pair
func (eta *bestETA) getRoutes(
ctx context.Context,
customer *customer.Customer,
drivers []driver.Driver,
) []routeResult {
results := make([]routeResult, 0, len(drivers))
wg := sync.WaitGroup{}
routesLock := sync.Mutex{}
for _, dd := range drivers {
wg.Add(1)
driver := dd // capture loop var
// Use worker pool to (potentially) execute
// requests in parallel
eta.pool.Execute(func() {
route, err := eta.route.FindRoute(ctx, driver.Location, customer.Location)
routesLock.Lock()
results = append(results, routeResult{
driver: driver.DriverID,
route: route,
err: err,
})
routesLock.Unlock()
wg.Done()
})
}
wg.Wait()
return results
}
RouteWorkerPoolSize = 3
Image for post
Image for post
Latency results after fixing worker pool bottleneck
Image for post
Image for post
Trace after fixing worker pool bottleneck

Resource Usage Attribution via Baggage

func computeRoute(
ctx context.Context,
pickup, dropoff string,
) *Route {
start := time.Now()
defer func() {
updateCalcStats(ctx, time.Since(start))
}()
// actual calculation
}
var routeCalcByCustomer = expvar.NewMap(
"route.calc.by.customer.sec",
)
var routeCalcBySession = expvar.NewMap(
"route.calc.by.session.sec",
)
var stats = []struct {
expvar *expvar.Map
baggage string
}{
{routeCalcByCustomer, "customer"},
{routeCalcBySession, "session"},
}
func updateCalcStats(ctx context.Context, delay time.Duration) {
span := opentracing.SpanFromContext(ctx)
if span == nil {
return
}
delaySec := float64(delay/time.Millisecond) / 1000.0
for _, s := range stats {
key := span.BaggageItem(s.baggage)
if key != "" {
s.expvar.AddFloat(key, delaySec)
}
}
}
route.calc.by.customer.sec: {
Amazing Coffee Roasters: 1.479,
Japanese Deserts: 2.019,
Rachel's Floral Designs: 5.938,
Trom Chocolatier: 2.542
},
route.calc.by.session.sec: {
0861: 9.448,
6108: 2.530
},

How Do We Instrument with OpenTracing

// NewServer creates a new driver.Server
func NewServer(
hostPort string,
tracer opentracing.Tracer,
metricsFactory metrics.Factory,
logger log.Factory,
) *Server {
channelOpts := &tchannel.ChannelOptions{
Tracer: tracer,
}
ch, err := tchannel.NewChannel("driver", channelOpts)
if err != nil {
logger.Bg().Fatal("Cannot create TChannel", zap.Error(err))
}
server := thrift.NewServer(ch)
// FindNearest implements Thrift interface TChanDriver
func (s *Server) FindNearest(
ctx thrift.Context,
location string,
) ([]*driver.DriverLocation, error) {
s.logger.For(ctx).Info("Searching for nearby drivers",
zap.String("location", location))
driverIDs := s.redis.FindDriverIDs(ctx, location)

What About Vendor Neutrality

// Init creates a new instance of Jaeger tracer.
func Init(
serviceName string,
metricsFactory metrics.Factory,
logger log.Factory,
) opentracing.Tracer {
cfg := config.Configuration{
Sampler: &config.SamplerConfig{
Type: "const",
Param: 1,
},
Reporter: &config.ReporterConfig{
LogSpans: false,
BufferFlushInterval: 1 * time.Second,
},
}
tracer, _, err := cfg.New(
serviceName,
)
if err != nil {
logger.Bg().Fatal(
"cannot initialize Jaeger Tracer",
zap.Error(err),
)
}
return tracer
}

What About Logging

s.logger.For(ctx).Info(
"Searching for nearby drivers",
zap.String("location", location),
)
func (sl spanLogger) Info(msg string, fields ...zapcore.Field) {
sl.logToSpan("info", msg, fields...)
sl.logger.Info(msg, fields...)
}

OpenTracing Is Not Just for Tracing

Image for post
Image for post
RPC metrics for /customer endpoint
Image for post
Image for post
Example of charting metrics emitted by Jaeger tracer

Conclusion

OpenTracing

A blog discussing distributed tracing and context…

Yuri Shkuro

Written by

Software engineer @ Uber. Creator of tracing platform Jaeger. Author of “Mastering Distributed Tracing”. https://shkuro.com/

OpenTracing

A blog discussing distributed tracing and context propagation

Yuri Shkuro

Written by

Software engineer @ Uber. Creator of tracing platform Jaeger. Author of “Mastering Distributed Tracing”. https://shkuro.com/

OpenTracing

A blog discussing distributed tracing and context propagation