SpartaGrafana — Serverless Monitoring

5 min readNov 8, 2016

One of the most contentious and pedantically debated points around serverless is the term itself.

Luckily, I’m not going to talk about that. Let’s just call it Jeff.

What seems less controversial is that serverless represents an opportunity to quickly develop truly cloud-native applications. Applications that begin to intrinsically incorporate broader operational aspects into their core codebase. If serverless is about super-advanced cloud powers, how might the avengers of features, operations, security, capacity, and other apparently disjoint domains come together in a single repository, with a bias towards a shared language and a common goal?

Graph All The Things

At the first Seattle Serverless Meetup, Rob Gruhl from Nordstrom presented a preview of their serverless-artillery load-testing tool. He gave an excellent talk and one of the things I took away from his presentation was the incredible level of observability their team had attained. The ability to visualize the increase in load against their system, with minimal latency, was very powerful. Rob and the team had done this by instrumenting their code and publishing the results to Grafana.

Not to mention it looked really cool.

While watching the presentation, I made a mental note to somehow add support for spinning up a Grafana instance as part of a Sparta application. This of course meant extending Sparta beyond serverless, into that vast and well-traveled territory of…servers.

Servers?

Serverless is a good fit for many workloads and stages of product development, but running Grafana (a SaaS version is coming!) isn’t one of them. However, the recent addition of CloudFormation cross-stack references made it conceivable to both decouple and bridge the two worlds.

The SpartaGrafana application demonstrates how to use the new Sparta WorkflowHooks feature to provision a completely independent CloudFormation stack that supports a single EC2 instance running Grafana to which AWS lambda instances publish metrics.

Grafana Stack Definition

The Grafana stack definition is fully specified by grafana/grafana.go which:

Defines the CloudFormation stack, including a CloudFormation export that publishes the EC2 PublicDnsName.
Includes a CloudInit shell script that downloads Grafana, InfluxDB, and scripts to bootstrap the InfluxDB datasource and Grafana Sparta Hello World dashboard.

The stack status is ensured by ConvergeStackState, which does most of the heavy AWS lifting (marshaling, uploading to s3, waiting for the stack to complete, etc.).

Behind the Serverless Curtain

Once the Grafana stack is created/updated in PostBuildHook, the normal Sparta provisioning workflow continues. This deploys a HelloWorld lambda function together with a single-resource API Gateway stage.

The most interesting aspect of the HelloWorld lambda function is that it integrates with the go-metrics library. Each lambda invocation increases a local counter via helloWorldCounterMetric.Inc(1). The lambda instance is able to discover the Grafana EC2 DNS name by first Fn::Importing the value:

which is then looked up at initialization time to create the InfluxDB publisher:

Using this information, each lambda container instance sets up a publishing loop that includes a randomly generated tag value to identify the lambda container instance.

The benefit of the Fn::Import statement is that it prevents “fat-finger” deletions via the AWS Console. It’s not possible to delete the Grafana stack while the SpartaGrafanaPublisher stack is active.

Results

Provisioning SpartaGrafana creates two CloudFormation stacks. The CloudFormation Outputs for the Grafana stack include the URL to log in to the Grafana console (admin/admin is the default username and password).

After logging in to Grafana, navigate to the pre-created Sparta Hello World dashboard. This dashboard tracks a single metric (helloWorldCounterMetric) that graphs how frequently the lambda function was invoked.

We’ll use https://goad.io to generate some load against our lambda function. First we need to find the API-Gateway URL for our lambda function URL via the AWS Console ( API Gateway➤Sparta Grafana➤Stages➤dev➤GET — /hello/grafana). Copy that value and paste it into the Goad web form:

After a few seconds, you can see AWS Lambda starting to spin up containers to handle the traffic:

Compared to the previous GIF which only showed 2 containers running, you can see in the image above that AWS Lambda has created 4 containers of varying lifespans to handle this load test.

And thanks to go-metrics, our simple service automatically publishes a host of other metrics that are available for visualization.

Seamless Serverless

Serverless offers an incredibly easy and potentially transformative way to create highly-scalable and resilient cloud-native applications. However, there are times when the standard (I’m reluctant to say “legacy”) instance-based deployments make sense. Occasionally, a single service may even need to deploy multiple isolated topologies that work together to support a business goal.

And other times they need to work together to produce cool graphs that help you visualize what your serverless application is actually doing.