Visualizing Instrumental Data from the Command Line

Published in

Expected Behavior Blog

6 min readFeb 4, 2020

Nathan recently pulled some shenanigans to get weird data into Instrumental. I thought it would be fun, instead, to take Instrumental data and put it in a weird place. Enter Sampler:

Sampler is a tool for shell commands: execution, visualization, and alerting. Configured with a simple YAML file.

The main page has loads of examples and instructions for getting started.

I first installed it via brew.

brew cask install sampler

Then, after playing with some of the examples, I started digging into the Instrumental API. Sampler expects a single numerical data point from the sample script in the config file. Instrumental’s metric data API endpoint returns something like this:

{
  "version": 2,
  "flags": 0,
  "response": {
    "metrics": [
      {
        "id": "test.gauge_metric",         // The name of the metric
        "project_id": 1,                   // The integer id associated with your project
        "type": "gauge",                   // The type you've reported as the metric
        "created_at": 1326493549,          // When you first reported the metric
        "updated_at": 1326729988,          // When you last updated the metric
        "expression": "test.gauge_metric", // Expression used to retrieve metric
        "name": "test.gauge_metric",       // Name derived from expression of metric
        "values": {            // Datapoints
          "start": 1326728400, // The beginning of the time series
          "stop": 1326730200,  // The end of the time series
          "resolution": 60,    // The requested resolution of the time series
          "duration": 1800,    // The requested duration of the time series
          "data": [
            {
              "s": 4675616.0,         // The sum of datapoints at this moment
              "c": 94742,             // The count of datapoints at this moment
              "a": 49.351037554622025 // Average (sum/count)
            },
            // ... Above hash repeats over duration
          ]
        }
      }
    ]
  }
}

By default, the response payload will span 30 minutes and contains a data point object for each minute in that time span. The last data point is for the current minute and is updated throughout the minute, so, to have an accurate graph, I left that point off and focused on the 29th data point. Depending on the data, I’ll want s, c, or a from that data point. To parse the JSON response I used jq:

brew install jq

After some reading and testing, I came up with the command to pull out the relevant data point:

jq '.response.metrics[0].values.data[29].a // 0'

The jq filter above will pull out the s (sum) value on the 29th data point and if it happens to be null it will return 0 instead (which is the default behavior for Instrumental).

So putting it all together we get something like this to pull average cpu across all instances. This data is provided by the Instrumentald agent.

curl -s -H 'X-Instrumental-Token: <token>' 'https://instrumentalapp.com/api/2/metrics/ts_average(system.*.cpu.usage_user)' | jq '.response.metrics[0].values.data[29].a // 0'

Next, I put that curl request into a Sampler config file as the sample directive and added some variables to make it a little easier to write and make it so the token can be provided as a command-line parameter:

#sampler.yml
variables:
    token: <token>
    url: https://instrumentalapp.com/api/2/metrics
runcharts:
  - title: Avg CPU
    position: [[0, 0], [79, 20]]
    rate-ms: 20000
    legend:
        enabled: true
        details: true
    scale: 0
    items:
      - label: user%
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $url/ts_average\(*.cpu.usage_user\) | jq ''.response.metrics[0].values.data[29].a // 0'''
  - title: Avg Memory
    position: [[0, 20], [79, 19]]
    rate-ms: 20000
    legend:
        enabled: true
        details: false
    scale: 0
    items:
      - label: used%
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $url/ts_average\(*.memory.used_percent\) | jq ''.response.metrics[0].values.data[29].a // 0'''
      - label: available%
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $url/ts_average\(*.memory.available_percent\) | jq ''.response.metrics[0].values.data[29].a // 0'''

To run this, you’ll need to specify the config file and pass in your project token on the command line, like this:

sampler -c sampler.yml -e token=your_token_here

And then, you’ll get cool graphs like these!

Some notes on the config file above:

Ideally, you’ll want an Instrumental query that resolves down to a single time series, since that’s what Sampler needs. For example: *.cpu.usage_user would not be a good choice, as it will return a data set for every instance of .cpu.usage_user that exists. Either reduce it down with average like above ts_average(*.cpu.usage_user), or request a specific machine system.app01.cpu.usage_user.
I set the rate-ms value to 20 seconds even though the data for the graph will only change once a minute. It could be set to 60 seconds, but I like seeing it update even if the value doesn’t change.
Next, I set about trying some of the other graph types for variety/interest/color and some more complex graph expressions. Here’s what I came up with:

#sampler.yml:
variables:
    cache_cpu: https://instrumentalapp.com/api/2/metrics/ts_average(aws_elasticache.*.engine_cpu_utilization)
    cache_hits: https://instrumentalapp.com/api/2/metrics/ts_average(aws_elasticache.*.cache_hits)
    cache_lag: https://instrumentalapp.com/api/2/metrics/ts_average(aws_elasticache.*.replication_lag)
    cache_miss: https://instrumentalapp.com/api/2/metrics/ts_average(aws_elasticache.*.cache_misses)
    cpu_s: https://instrumentalapp.com/api/2/metrics/series_clamp(0%2C%20100%2C%20constant(100)%20-%20ts_average(*.cpu.usage_idle))
    load_1: https://instrumentalapp.com/api/2/metrics/ts_average(*.load.load1)
    load_5: https://instrumentalapp.com/api/2/metrics/ts_average(*.load.load5)
    load_15: https://instrumentalapp.com/api/2/metrics/ts_average(*.load.load15)
    mem_used: https://instrumentalapp.com/api/2/metrics/ts_average(*.memory.used_percent)
    rds_connections: https://instrumentalapp.com/api/2/metrics/ts_average(aws_rds.*.database_connections)
    rds_cpu: https://instrumentalapp.com/api/2/metrics/ts_average(aws_rds.*.cpu_utilization)
    rds_r_latency: https://instrumentalapp.com/api/2/metrics/ts_average(aws_rds.*.read_latency)
    rds_w_latency: https://instrumentalapp.com/api/2/metrics/ts_average(aws_rds.*.write_latency)
    token: <token>
    url: https://instrumentalapp.com/api/2/metrics
runcharts:
  - title: ElastiCache CPU
    position: [[40, 10], [39, 10]]
    rate-ms: 20000
    legend:
        enabled: true
        details: false
    scale: 3
    items:
      - label: engine cpu %
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cache_cpu | jq ''.response.metrics[0].values.data[26].a // 0'''
  - title: ElastiCache Replication Lag
    position: [[40, 20], [39, 10]]
    rate-ms: 20000
    legend:
        enabled: true
        details: false
    scale: 1
    items:
      - label: seconds
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cache_lag | jq ''.response.metrics[0].values.data[26].a // 0'''
  - title: ElastiCache Cache Hit/Miss
    position: [[40, 30], [39, 10]]
    rate-ms: 20000
    legend:
        enabled: true
        details: false
    scale: 0
    items:
      - label: hits
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cache_hits | jq ''.response.metrics[0].values.data[26].a // 0'''
      - label: misses
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cache_miss | jq ''.response.metrics[0].values.data[26].a // 0'''
  - title: RDS CPU
    position: [[0, 10], [39, 10]]
    rate-ms: 20000
    legend:
        enabled: true
        details: false
    scale: 3
    items:
      - label: cpu %
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $rds_cpu | jq ''.response.metrics[0].values.data[26].a // 0'''
  - title: RDS Latency
    position: [[0, 20], [39, 10]]
    rate-ms: 20000
    legend:
        enabled: true
        details: false
    scale: 5
    items:
      - label: read
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $rds_r_latency | jq ''.response.metrics[0].values.data[26].a // 0'''
      - label: write
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $rds_w_latency | jq ''.response.metrics[0].values.data[26].a // 0'''
  - title: RDS Connections
    position: [[0, 30], [39, 10]]
    rate-ms: 20000
    legend:
        enabled: true
        details: false
    scale: 0
    items:
      - label: connections
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $rds_connections | jq ''.response.metrics[0].values.data[26].a // 0'''
barcharts:
  - title: Load
    position: [[54, 0], [25, 8]]
    rate-ms: 20000
    scale: 2
    items:
      - label: Load 1
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $load_1 | jq ''.response.metrics[0].values.data[29].a // 0'''
      - label: Load 5
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $load_5 | jq ''.response.metrics[0].values.data[29].a // 0'''
      - label: Load 15
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $load_15 | jq ''.response.metrics[0].values.data[29].a // 0'''
gauges:
  - title: Average Memory Used
    position: [[54, 8], [25, 1]]
    rate-ms: 20000
    scale: 0
    cur:
        sample: 'curl -s -H ''X-Instrumental-Token: ''$token $mem_used | jq ''.response.metrics[0].values.data[29].a // 0'''
    max:
        sample: echo 100
    min:
        sample: echo 0
sparklines:
  - title: Average CPU
    position: [[0, 0], [53, 10]]
    rate-ms: 20000
    scale: 0
    sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cpu_s | jq ''.response.metrics[0].values.data[29].a // 0'''

Some notes:

If you want to iterate quickly on your expressions without having to wait for Sampler to accumulate data, it’s easy to build your expressions in the Instrumental web interface then copy the API URLs out of the Developer Tools Network Console.
URLs with escaped characters are much easier to deal with as variables. That way, you don’t have a second level of escaping required for the sample directives
The position values are auto-set by Sampler. If not provided, it will auto-layout all the graphs. After you run Sampler and change the layout, it will create/set the position values for you
I changed the cpu expression from just showing usage_user to instead show a better representation of total cpu usage: series_clamp(0, 100, constant(100) — ts_average(*.cpu.usage_idle))
AWS data comes from the AWS CloudWatch integration

In the end, this was a fun exercise. It’s cool to be able to watch Instrumental data from the command line, and Sampler is an awesome tool.

Visualizing Instrumental Data from the Command Line

Written by Expected Behavior