Visualizing Instrumental Data from the Command Line
By David Ronk
Nathan recently pulled some shenanigans to get weird data into Instrumental. I thought it would be fun, instead, to take Instrumental data and put it in a weird place. Enter Sampler:
Sampler is a tool for shell commands: execution, visualization, and alerting. Configured with a simple YAML file.
The main page has loads of examples and instructions for getting started.
I first installed it via brew.
brew cask install sampler
Then, after playing with some of the examples, I started digging into the Instrumental API. Sampler expects a single numerical data point from the sample script in the config file. Instrumental’s metric data API endpoint returns something like this:
{
"version": 2,
"flags": 0,
"response": {
"metrics": [
{
"id": "test.gauge_metric", // The name of the metric
"project_id": 1, // The integer id associated with your project
"type": "gauge", // The type you've reported as the metric
"created_at": 1326493549, // When you first reported the metric
"updated_at": 1326729988, // When you last updated the metric
"expression": "test.gauge_metric", // Expression used to retrieve metric
"name": "test.gauge_metric", // Name derived from expression of metric
"values": { // Datapoints
"start": 1326728400, // The beginning of the time series
"stop": 1326730200, // The end of the time series
"resolution": 60, // The requested resolution of the time series
"duration": 1800, // The requested duration of the time series
"data": [
{
"s": 4675616.0, // The sum of datapoints at this moment
"c": 94742, // The count of datapoints at this moment
"a": 49.351037554622025 // Average (sum/count)
},
// ... Above hash repeats over duration
]
}
}
]
}
}
By default, the response payload will span 30 minutes and contains a data point object for each minute in that time span. The last data point is for the current minute and is updated throughout the minute, so, to have an accurate graph, I left that point off and focused on the 29th data point. Depending on the data, I’ll want s
, c
, or a
from that data point. To parse the JSON response I used jq:
brew install jq
After some reading and testing, I came up with the command to pull out the relevant data point:
jq '.response.metrics[0].values.data[29].a // 0'
The jq filter above will pull out the s
(sum) value on the 29th data point and if it happens to be null
it will return 0
instead (which is the default behavior for Instrumental).
So putting it all together we get something like this to pull average cpu across all instances. This data is provided by the Instrumentald agent.
curl -s -H 'X-Instrumental-Token: <token>' 'https://instrumentalapp.com/api/2/metrics/ts_average(system.*.cpu.usage_user)' | jq '.response.metrics[0].values.data[29].a // 0'
Next, I put that curl request into a Sampler config file as the sample directive and added some variables to make it a little easier to write and make it so the token can be provided as a command-line parameter:
#sampler.yml
variables:
token: <token>
url: https://instrumentalapp.com/api/2/metrics
runcharts:
- title: Avg CPU
position: [[0, 0], [79, 20]]
rate-ms: 20000
legend:
enabled: true
details: true
scale: 0
items:
- label: user%
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $url/ts_average\(*.cpu.usage_user\) | jq ''.response.metrics[0].values.data[29].a // 0'''
- title: Avg Memory
position: [[0, 20], [79, 19]]
rate-ms: 20000
legend:
enabled: true
details: false
scale: 0
items:
- label: used%
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $url/ts_average\(*.memory.used_percent\) | jq ''.response.metrics[0].values.data[29].a // 0'''
- label: available%
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $url/ts_average\(*.memory.available_percent\) | jq ''.response.metrics[0].values.data[29].a // 0'''
To run this, you’ll need to specify the config file and pass in your project token on the command line, like this:
sampler -c sampler.yml -e token=your_token_here
And then, you’ll get cool graphs like these!
Some notes on the config file above:
- Ideally, you’ll want an Instrumental query that resolves down to a single time series, since that’s what Sampler needs. For example: *.cpu.usage_user would not be a good choice, as it will return a data set for every instance of
.cpu.usage_user
that exists. Either reduce it down with average like abovets_average(*.cpu.usage_user)
, or request a specific machinesystem.app01.cpu.usage_user
. - I set the
rate-ms
value to 20 seconds even though the data for the graph will only change once a minute. It could be set to 60 seconds, but I like seeing it update even if the value doesn’t change. - Next, I set about trying some of the other graph types for variety/interest/color and some more complex graph expressions. Here’s what I came up with:
#sampler.yml:
variables:
cache_cpu: https://instrumentalapp.com/api/2/metrics/ts_average(aws_elasticache.*.engine_cpu_utilization)
cache_hits: https://instrumentalapp.com/api/2/metrics/ts_average(aws_elasticache.*.cache_hits)
cache_lag: https://instrumentalapp.com/api/2/metrics/ts_average(aws_elasticache.*.replication_lag)
cache_miss: https://instrumentalapp.com/api/2/metrics/ts_average(aws_elasticache.*.cache_misses)
cpu_s: https://instrumentalapp.com/api/2/metrics/series_clamp(0%2C%20100%2C%20constant(100)%20-%20ts_average(*.cpu.usage_idle))
load_1: https://instrumentalapp.com/api/2/metrics/ts_average(*.load.load1)
load_5: https://instrumentalapp.com/api/2/metrics/ts_average(*.load.load5)
load_15: https://instrumentalapp.com/api/2/metrics/ts_average(*.load.load15)
mem_used: https://instrumentalapp.com/api/2/metrics/ts_average(*.memory.used_percent)
rds_connections: https://instrumentalapp.com/api/2/metrics/ts_average(aws_rds.*.database_connections)
rds_cpu: https://instrumentalapp.com/api/2/metrics/ts_average(aws_rds.*.cpu_utilization)
rds_r_latency: https://instrumentalapp.com/api/2/metrics/ts_average(aws_rds.*.read_latency)
rds_w_latency: https://instrumentalapp.com/api/2/metrics/ts_average(aws_rds.*.write_latency)
token: <token>
url: https://instrumentalapp.com/api/2/metrics
runcharts:
- title: ElastiCache CPU
position: [[40, 10], [39, 10]]
rate-ms: 20000
legend:
enabled: true
details: false
scale: 3
items:
- label: engine cpu %
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cache_cpu | jq ''.response.metrics[0].values.data[26].a // 0'''
- title: ElastiCache Replication Lag
position: [[40, 20], [39, 10]]
rate-ms: 20000
legend:
enabled: true
details: false
scale: 1
items:
- label: seconds
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cache_lag | jq ''.response.metrics[0].values.data[26].a // 0'''
- title: ElastiCache Cache Hit/Miss
position: [[40, 30], [39, 10]]
rate-ms: 20000
legend:
enabled: true
details: false
scale: 0
items:
- label: hits
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cache_hits | jq ''.response.metrics[0].values.data[26].a // 0'''
- label: misses
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cache_miss | jq ''.response.metrics[0].values.data[26].a // 0'''
- title: RDS CPU
position: [[0, 10], [39, 10]]
rate-ms: 20000
legend:
enabled: true
details: false
scale: 3
items:
- label: cpu %
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $rds_cpu | jq ''.response.metrics[0].values.data[26].a // 0'''
- title: RDS Latency
position: [[0, 20], [39, 10]]
rate-ms: 20000
legend:
enabled: true
details: false
scale: 5
items:
- label: read
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $rds_r_latency | jq ''.response.metrics[0].values.data[26].a // 0'''
- label: write
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $rds_w_latency | jq ''.response.metrics[0].values.data[26].a // 0'''
- title: RDS Connections
position: [[0, 30], [39, 10]]
rate-ms: 20000
legend:
enabled: true
details: false
scale: 0
items:
- label: connections
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $rds_connections | jq ''.response.metrics[0].values.data[26].a // 0'''
barcharts:
- title: Load
position: [[54, 0], [25, 8]]
rate-ms: 20000
scale: 2
items:
- label: Load 1
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $load_1 | jq ''.response.metrics[0].values.data[29].a // 0'''
- label: Load 5
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $load_5 | jq ''.response.metrics[0].values.data[29].a // 0'''
- label: Load 15
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $load_15 | jq ''.response.metrics[0].values.data[29].a // 0'''
gauges:
- title: Average Memory Used
position: [[54, 8], [25, 1]]
rate-ms: 20000
scale: 0
cur:
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $mem_used | jq ''.response.metrics[0].values.data[29].a // 0'''
max:
sample: echo 100
min:
sample: echo 0
sparklines:
- title: Average CPU
position: [[0, 0], [53, 10]]
rate-ms: 20000
scale: 0
sample: 'curl -s -H ''X-Instrumental-Token: ''$token $cpu_s | jq ''.response.metrics[0].values.data[29].a // 0'''
Some notes:
- If you want to iterate quickly on your expressions without having to wait for Sampler to accumulate data, it’s easy to build your expressions in the Instrumental web interface then copy the API URLs out of the Developer Tools Network Console.
- URLs with escaped characters are much easier to deal with as variables. That way, you don’t have a second level of escaping required for the sample directives
- The
position
values are auto-set by Sampler. If not provided, it will auto-layout all the graphs. After you run Sampler and change the layout, it will create/set the position values for you - I changed the cpu expression from just showing
usage_user
to instead show a better representation of total cpu usage:series_clamp(0, 100, constant(100) — ts_average(*.cpu.usage_idle))
- AWS data comes from the AWS CloudWatch integration
In the end, this was a fun exercise. It’s cool to be able to watch Instrumental data from the command line, and Sampler is an awesome tool.