Statistics support for OpenWhisk Serverless Platform using Prometheus and Grafana
Before moving deeper into the topic we expect you to have the basic understanding on Kubernetes, Docker and Serverless Computing Concepts. If not you may refer to the links mentioned in this article or any valuable resource available to sharpen up your knowledge. Hope you will enjoy the article :D
What is Serverless Computing
“Serverless architectures are application designs that incorporate third-party “Backend as a Service” (BaaS) services, and/or that include custom code run in managed, ephemeral containers on a “Functions as a Service” (FaaS) platform.”
In our discussion, serverless computing concept is implemented using OpenWhisk which takes the infrastructure support from Kubernetes.
What is OpenWhisk
Apache OpenWhisk is a Serverless/Functions-as-a-Service (FaaS) platform that can be deployed in a cloud environment (compatible with docker) or locally if needed. OpenWhisk is a robust, scalable platform designed to support thousands of concurrent triggers and invocations.
Almost all the components of OpenWhisk are packaged and deployed as containers. From Nginx to Kafka, everything in the platform runs as a container.
It’s high-level architecture is given in a nutshell here. (Please follow additional readings to get sound knowledge on it.)
As shown in the above design diagram, NGINX, Controller, CouchDB, Kafka, and Invokers are the main components of OpenWhisk. Each component fulfills some specific, defined tasks.
Below diagram visualizes the flow of this system.
This is used as a HTTP and reverse proxy which exposes the public-facing HTTP(S) endpoint to the clients. Every request including those originating from the CLI (WSK CLI)go through this server.
This can be considered as the main component and the gatekeeper of the system. After a request passes through Nginx, it hits the Controller. It performs the authentication and authorization (verifies the credentials against the ones stored in CouchDB) of every request before handing over the control to the next component. It decides the path that the request will eventually take.
Apache CouchDB is one of the NoSql solutions which acts as a JSON data store in this system. The state of the system is maintained and managed in CouchDB. The user credentials, actions metadata, namespaces, and the definitions of actions, triggers, and rules, are stored in CouchDB.
Apache Kafka is used for building real-time data pipelines and streaming applications. OpenWhisk takes advantage of Kafka to manage the connection of Controller with Invokers.
Kafka buffers the messages sent by the Controller before delivering them to the Invoker. When Kafka confirms that the message is delivered, The Controller immediately responds with the Activation ID. Activation ID can be used to invoke actions, get meta data, etc after that. Kafka publisher publishes messages for several topics to be used by consumers.
Apache ZooKeeper maintains and manages the Kafka cluster. Zookeeper’s primary job is to track status of nodes present in Kafka cluster and also to keep track of the topics, messages, etc.
The Invoker makes the decision of either reusing an existing “hot” container, or starting a paused “warm” container, or launching a new “cold” container for a new invocation.
Based on the above cases, it gets a Docker container that acts as the unit of execution for the chosen Action. The Invoker copies the source code from CouchDB and injects that into the Docker container. Once the execution is completed, it stores the outcome of the Activation in CouchDB for future retrievals.
Prometheus’s providing key features such as:
- It supports time series data with a multi-dimensional data model (identified by metric name and key/value pairs)
- Provides PromQl (which will be discussed in next paragraphs): a flexible query language to query and build metrics
- No reliance on distributed storage
- Supports data “Pushing” and “Pulling”:
Pull model: time series collection happens via a pull model over HTTP. “Scrape interval” which decides the scheduled time to pull data by Prometheus can be defined for pulling.
Push model: pushing time series is supported via an intermediary gateway (ex: Prometheus Pushgateway)
- Targets are discovered via service discovery or static configuration
- Multiple modes of graphing and dashboarding support
Below given the high level architecture of Prometheus. Please refer their official page for more information.
Prometheus provides a query language to process time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus’s expression browser, or consumed by external systems via the HTTP API.
In PromQL, there are inbuilt functions to get many mathematical functionalities. You can find about all the inbuilt functions that are exposed in here. In this application we have mainly used sum() aggregation operator with irate function. Examples will be discussed implementation section.
Grafana is an open source, feature rich metrics dashboard and graph editor for Graphite, Elasticsearch, OpenTSDB, Prometheus and InfluxDB. Since it has default support for Prometheus, it is easy to process data and visualize using this combination.
High Level Flow of the Implementation
In OpenWhisk, when an action is invoked, invocation meta data is sent to Kafka. Kafka consumers need to be subscribed to the topic “events” and then subscribers can get the activation meta data.
This is the basic format of activation meta-data. It gives two events named “Metric” and “Activation”.
Then code level listeners get activation meta data and then process them using Prometheus pre-defined functions.
Then aggregated data is sent to Prometheus Pushgateway. Prometheus server scrapes those data from Pushgateway periodically and visualize them using designed Grafana charts.
Deeper into Codebase
You can refer to the sample code here, which we have implemented to run this scenario.
OpenwhiskStatsExporter has the main function. Inside the OpenwhiskStatsExporter “counters” are registered with lables. Those will be used in onEvent method when processing activation data and doing aggregations.
When the main function is being called, it starts collecting events from Kafka topic through OpenwhiskEventCollector. OpenwhiskEventCollector processes Kafka events and sends to listeners. Inside onEvent function, it processes counters based on events received and pushes metrics to Pushgateway. Prometheus scrape interval to pull data from Pushgateway is defined here.
Those metric values are used to in Grafana charts.
Test class to invoke actions, to test whole implementation and to visualize data in Grafana charts are implemented here. You can add more actions (in various languages) to Action.java class to test functionality.
You can export your Grafana dashboard as a Json to helm charts. That is what we have done here. This codebase is deployed in docker containers.
Dashboards with Grafana
Following are some of the dashboards we designed to visualize action invocation statistics. You can filter stats based on action language, action name and error type.
Hope you enjoyed this article and found something useful for your knowledge and implementation! :D