DevOps Observability: Instrumenting Local Deployments

dm03514
ValueStream by Operational Analytics, Inc.
6 min readDec 1, 2019

Have you ever deployed to production from your local machine? At ValueStream, we believe the first step to making sustainable changes to any system is understanding the elements of the system and their connections. This is especially important for human centric systems like DevOps, where it’s difficult to inventory and monitor all the different teams, processes and performance across them.

This post shows how ValueStream can be used to start monitoring local deploys with minimal effort. This allows organizations to treat these important local actions as the first class citizens they are, monitor their performance and build an inventory of locally executed operations. This post will instrument ValueStream’s actual production deployment script in order to capture detailed build and deploy metrics.

Problem

Local, CLI based, deploys are common in many startups. Processes that start on small teams that don’t want to invest in cloud deployments are inherited as companies grow. Executing actions locally means that audit information is missed, who, when, why were deploys taking place, in the case of DevOps metrics, how often, how long, and what’s the error rate, and how much time the deploy is adding to a work item’s overall lead time. It’s difficult to improve what isn’t even known about. ValueStream aims to provide zero friction observability tools to enable organizations to inventory and measure their DevOps and deployment processes.

Example

Here at ValueStream we deploy to production locally using google cloud SDK, the rest of this post shows how all it takes is a couple of minutes using ValueStream to begin tracking any local scripts. ValueStream focuses on providing metrics at two separate levels:

Events represent individual actions at a point in time:

They are often expressed as an aggregate over time and are often used to gain a coarse level understanding of a system performance. They are important signals for monitoring any system and usually take the form of: Throughput, Latency, Error Rate and Saturation. These are so common that Google refers to them as the “Four Golden Signals”.

Traces connect events together by establishing causality between events.

ValueStream uses a standard called OpenTracing in order to model the relationship between events. Traces are ValueStream’s special sauce and allow ValueStream to model complex DevOps and development processes with minimal effort, all that’s required is to pass an identifier for an instrumented action that refers to one or more parent actions. Tracing is becoming extremely popular in distributed systems for the detailed insight it can provide about processes. ValueStream makes this detailed analysis accessible to managers, director, DevOps engineers, and anyone else responsible for organizational performance.

Tracking Events

As background, ValueStream Cloud is hosted in Google Cloud and uses Google App Engine to host its production infrastructure. The deployment process is executed daily and requires the following steps:

A simple bash script is used to make the deploy:

# deploy-api.sh

docker build -f Dockerfile.api -t valuestream-api .

docker tag valuestream-api us.gcr.io/value-stream/valuestream-api
docker push us.gcr.io/value-stream/valuestream-api

gcloud app deploy \
--image-url us.gcr.io/value-stream/valuestream-api \
devops/gae/app.api.yaml \
--version=v1 \
--quiet

While this is easy there’s no visibility audit log or history outside of the google console. We deploy 1–2 times a day, is this a good candidate to improve? How long are we spending in the deploy? What’s the success rate? Basically as it stands now there’s no visibility or answers to these common DevOps and delivery questions. ValueStream ships with a “Custom HTTP” Event source which supports user submitted (ad-hoc events). Below shows the deploy-api.sh script instrumented to capture deploy durations:

TRACEID="$(vscli event -tag='source|gcloud' -tag='service|api' -type=pipeline start)"

...

vscli event -type=pipeline end -event-id=${TRACEID}

(Images Below shows Traces in LightStep; ValueStream OSS can output to Jaeger and LightStep, and ValueStream Cloud Beta will only ship with LightStep support as a metric store, requiring a free LightStep account in order to use):

The trace above shows the duration of the trace and all associated tags. LightStep enables grouping traces by tag, comparing durations to past intervals, and seeing aggregates of event rates, latency distributions and error rates. In two lines of code we’ve started to track something that only a single engineer was experiencing and are now able to surface that up to a centralized location where it can be inventoried and benchmarked.

Pipeline Traces

While extremely useful for debugging DevOps, ValueStream’s real power comes from being able to model processes through traces. The deployment script has 3 different logical steps:

  • Build
  • Push
  • Deploy

ValueStream is able to connect each of these stages by providing a reference to the parent event for each of the children events. Using the vscli tool to do this looks like:

#!/bin/bash

TRACEID="$(vscli event -tag='source|gcloud' -tag='service|api' -type=pipeline start)"

BUILD_TRACEID="$(vscli event -type=build -tag='type|docker' start -parent-event-id=vstrace-customhttp-pipeline-default-${TRACEID})"
docker build -f Dockerfile.api -t valuestream-api .
vscli event -type=build end -event-id=${BUILD_TRACEID}

PUSH_TRACEID="$(vscli event -type=push -tag='type|docker' start -parent-event-id=vstrace-customhttp-pipeline-default-${TRACEID})"
docker tag valuestream-api us.gcr.io/value-stream/valuestream-api
docker push us.gcr.io/value-stream/valuestream-api
vscli event -type=push end -event-id=${PUSH_TRACEID}

DEPLOY_TRACEID="$(vscli event -type=deploy -tag='type|gae' start -parent-event-id=vstrace-customhttp-pipeline-default-${TRACEID})"
gcloud app deploy \
--image-url us.gcr.io/value-stream/valuestream-api \
devops/gae/app.api.yaml \
--version=v1 \
--quiet
vscli event -type=deploy end -event-id=${DEPLOY_TRACEID}

vscli event -type=pipeline end -event-id=${TRACEID}

Many processes are naturally expressed as a pipeline:

  • JIRA story moving across swim lanes
  • Build Pipelines, Jenkins, Gitlab, Local, etc
  • Team Working on multiple issues or repos towards completing an Epic

Executing the script again shows the updated instrumentation:

The above image is able to capture how much time is spent in each state of the deploy. Modeling this as a trace captures each stage of the deploy and its parent. In other words it maintains causality. ValueStream will be able to track causality all the way up to the epic level, allowing organizations to have a complete view into all work that is required in order to ship software. Modeling events as a graph also enables a number of advanced analyses, which when combined with LightStep can be used to intelligently debug delivery across multiple different tools, teams, or processes.

The image below shows the time when a cached image is used. The build time is halved (2min -> 1m) and the push time drops from 26sec -> 2.7sec since no layers are pushed:

This example only captures duration of each event and establishes causality. A good future improvement would be to capture and report the error status:

$ vscli event -type=pipeline end -event-id=${ID} -error=true|false

This would allow us to see how often deploys are failing, and if they are associated with an increase in overall latency (debugging approach described in detail here).

Conclusion

ValueStream offers no-friction instrumentation for local scripts and processes. If you would like to gain visibility into your DevOps processes, ValueStream OpenSource is currently available on GitHub and ValueStream Cloud will be offering a free tier for its Beta release later this month. We would love to you hear your comments and feedback! Thank you.

References

--

--