Custom metrics in Node.js with OpenTelemetry (and Prometheus)

Yuri Grinshteyn
Google Cloud - Community
5 min readJan 27, 2020

In my last post, I tackled my first project with OpenTelemetry and built a basic demo to show how to use distributed tracing and the Stackdriver exporter. I chose Go for that exercise because at the time it was the only language that had a Stackdriver exporter for tracing available. This time, I wanted to attempt using OpenTelemetry for metric instrumentation and noticed that opentelemetry-go does not have a Stackdriver exporter for metrics ready yet. I attempted to use the Prometheus exporter instead, but could not figure out how to make it play nice with the Mux router and switched to Node.js instead.

So — here’s how I built a basic Hello World Node.js app and instrumented it with OpenTelemetry to expose basic “golden signals” metrics (request count, error count, and latency) to Prometheus.

As always, I’ve made my code available in a Github repo. Let’s review the interesting parts.

Imports and setup

I’m only using three external packages — Express to handle requests and two OpenTelemetry packages — one to write the metrics, and another to export them to Prometheus. Requests to :8081/metrics will be handled by Prometheus.

Note that I’ve overridden the default port that’s set in the Prometheus exporter code.

Metric definition

In the next part, I define the three metrics I want to track. Request count and error count are both monotonic counters, meaning that they can only be increased. Response latency is a non-monotonic gauge, since I want to track latency for every request. I am also using a single label because I could not figure how to get metrics to work properly with no labels.

Request handling

At this point, the app is ready to accept incoming requests. Requests to :8081/metrics will be handled by Prometheus, and I'm using Express to handle requests to :8080/. For every request, the app creates a timer ( measuredLatency) and keeps it running until it's ready to respond. It generates a random number between 0 and 100 and uses it to return an error half the time, in which case latency is not reported. If there's no error to return, the apps sleeps between 1 and 10 seconds, takes another time snapshot, calculates the elapsed time, and returns that value as the latency.

Metrics endpoint

When I run the app locally, I can then hit the /metrics endpoint and see the metrics I've registered there:

At this point, I was ready to run the app and generate actual traffic to it. That required:

  • creating a GKE cluster
  • installing Prometheus on it
  • deploying the app and configure the /metrics endpoint to be scraped

Let’s dive in!

Creating the cluster

I took a very straightforward approach and used the Cloud Console to create a single-node cluster (that turned out to be the wrong choice) using an n1-standard node. I then used gcloud container clusters get-credentials <cluster> --zone=<zone> to get credentials for it to be able to access it from my local workstation.

$ kubectl get nodesNAME                                     STATUS   ROLES    AGE     VERSIONgke-small-cluster-pool-1-b40da891-fc1p   Ready    <none>   7h27m   v1.15.7-gke.23

I took the simple option here:

$ brew install helm # install helm, the package manager for Kubernetes$ helm repo add stable https://kubernetes-charts.storage.googleapis.com/$ helm repo update$ helm install prometheus stable/prometheus

That was enough to get Prometheus up and running, but I had to increase the size of my cluster to get enough CPU to run everything.

> kubectl get podsNAME                                            READY   STATUS    RESTARTS   AGEprometheus-alertmanager-77995478d8-lh4c5        2/2     Running   0          6m34sprometheus-kube-state-metrics-f96cbbf97-vc56g   1/1     Running   0          6m34sprometheus-node-exporter-8xlwp                  1/1     Running   0          114sprometheus-node-exporter-s2pnt                  1/1     Running   0          6m34sprometheus-node-exporter-xbtrs                  1/1     Running   0          112sprometheus-pushgateway-747dc5fb57-rw72p         1/1     Running   0          6m34sprometheus-server-6f7467bfdd-5vfgx              2/2     Running   0          6m34s

I verified that Prometheus was working by using port-forward:

Deploying the app

I created a simple Dockerfile:

I built the image using gcloud builds submit and created a deployment file:

Next, I used kubectl apply -f to create the app and kubectl expose deployment --type=LoadBalancer to make it externally available. Once the external IP address was available, I configured a Stackdriver Uptime Check against it to generate a steady stream of traffic.

At this point, all of the setup was done. I accessed Prometheus again to confirm that the metrics I created using OpenTelemetry were available:

What have I learned?

In my OpenCensus exercise, I learned about using primitives like Views, Measurements, and Measures. I also learned how to create histograms and control aggregation. My very limited experience with OpenTelemetry has shown that this model seems to be somewhat simpler. In this exercise, I simply defined metrics and set their value when necessary. The exporter took care of the rest. While I probably missed out on a lot of detail, this initial experience makes things seem quite a bit easier to get started with, which is great for folks just getting started with this kind of instrumentation.

I’ve also realized that the project is still in relatively early stages. For example, there are no Stackdriver metric exporters available as I’m writing this. Trace exporters have been written for (at least) Go and Node.js, but metrics are not yet available. As such, Using this on GCP would require me to setup the Prometheus/Stackdriver integration using these instructions

I greatly enjoyed learning how to use OpenTelemetry to create custom metric telemetry — using it to instrument my app to expose the “golden signals” was quite straightforward, thanks largely to the excellent examples provided in the OpenTelemetry Github repo. Thanks for reading — come back next time when I attempt to use OpenTelemetry to integrate tracing and logs in Stackdriver!

Originally published at http://github.com.

--

--

Yuri Grinshteyn
Google Cloud - Community

CRE at Google Cloud. I write about observability in Google Cloud, especially as it relates to SRE practices.