DevOps KPIs with Elixir and Geckoboard

tl;dr : Check out our ex_gecko plugin on GitHub

If you haven’t heard, Geckoboard is a fantastic live dashboard company which makes it very easy to get key metrics on a beautifully designed board. On one of our previous milestones, we made a big push to get “Production ready” and did a two week sprint to get us there. One big component was building up a key set of metrics for us to monitor.

Scoping

First, we came up with a list of key metrics we wanted to gather. So initially we came up with this list

  • API Status and Response Time
  • UI Status and Response Times
  • Average API Response Time
  • Slow Requests
  • DB Backup status
  • Erlang process stats
  • InfluxDB Metrics
  • Companies, facilities, devices provisioned on BrighterLink

Implementation

First thing, I wanted to connect our Papertrail logs to Geckoboard. I couldn’t find any pre-built plugins, and the existing Papertrail one wasn’t sufficient to what we needed. I also briefly explored using Zapier to connect the two, but that wasn’t sufficient either. Luckily, I found a new Geckoboard beta service (at the time) called Datasets. They had ruby code examples and I decided to adapt it to Elixir, and thus created the ex_gecko library.

For the following examples, I am going focus on our Papertrail integration, but you’ll see additional integration points with Heroku and Runscope in the code library. I’m going follow Geckoboard’s documentation flow of “Authenticate, Create a Dataset, Add data, Add a Widget”

Authenticate

We used HTTPoison to make our API calls, and all we have to do to authenticate is to add an “Authorization” header (note, some of this code is not placed exactly the same, its only grouped for your viewing convenience)

Create the Papertrail dataset

Now we can authenticate, we want to call the “find or create” method, which is effectively a PUT command

Executing the command

To run this, I created a mix task to run the create_dataset command like this

mix gecko.load -d pt.reqs -r papertrail.reqs

This will create the “pt.reqs” dataset, using the “datasets/papertrail.reqs.json” schema. Note, if you run this on an existing dataset, the mix task will delete it and “reset” the schema.

The schema of the Papertrail dataset is a fairly simple. We have

  • Path (string)
  • Speed (number)
  • Size (name)
  • Status (string)
  • Timestamp (datetime)

These will all be used to create the right graphs in our Geckoboard widgets.

Adding Papertrail data

To make it a little more flexible, I created “adapters” which are integration points with various services, so in the case of Papertrail, I created the adapter to load the events using Papertrail’s cli tool

In this particular case, I used porcelain to execute the command line operation. The args to be passed to it is fairly straightforward, it’s as if you ran this command

papertrail -j -S “API Requests” — min-time “72 hours ago”

Note that “API Requests” is a search set I created in Papertrail to only look at API requests. You’ll need to use your own. The “j” is important as it will return the format in JSON which will make it easier to parse. Now, to put it together, run the mix task

mix gecko.load -t papertrail -d pt.reqs

This will load the Papertrail events into the dataset “pt.reqs”. Note, this is not cumulative, every time you run it, your dataset will be replaced by whatever events was loaded at that time. Also, the plugin only supports 500 events, but Geckoboard now allows for up to 5000 events.

Adding the Widget

Login to your Geckoboard account, and “add a widget”. Select the “Dataset” integration and a slider window will popup to let you select the dataset, in this case, we want the “pt.reqs” dataset.

Finally, I get to customize the widget, and depending on what you had in your schema, you can pick the right chart. If you notice, I picked the “Leaderboard” widget (on the upper right), and it uses the “Request Path” as the Label, and then the field I want is “Request Speed” and using the “Average” which will give me the average request speed. You can also select “Fine-tune” to add a “ms” as the suffix so it displays properly

Conclusion

This was fun and easy to setup, and we quickly added Heroku and Runscope integration, as well as integration with some built-in widgets such as “Up/Down” monitoring. And while writing this blog, I also noticed some new features, such as filtering by the path, as well as changing the time span you want to filter by. This goes to show you, fast executing SaaS services are racing to deliver ever better value, and sometimes its hard to keep up!

Finally, Yes I did read the article about how real-time dashboards can be harmful and I tend to agree that you shouldn’t just have a bunch of metrics you can’t act on. However, an operational dashboard which highlights key operating metrics are useful, especially when you are starting out, as it sets the tone for your team to care about the metrics, and to have visibility in your services.

PS — Right after implementing our dashboards, we saw consistently slow requests for a specific API, and we refactored our code to make it much faster. Our “average response time” for requests went from 2–3 seconds down to 200–300ms.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.