FDN-Monitor: Unifying the Monitoring of Multiple Serverless Compute Platforms (multi-cloud)

4 min readNov 18, 2022

Introduction:

There exists a myriad of serverless compute platforms, such as AWS Lambda, Google Cloud Functions, OpenWhisk, and OpenFaaS. Each serverless compute platform is implemented differently and has their own monitoring solution; thus, different metrics and names posing a challenge for monitoring across the multi-platforms setup. OpenWhisk and OpenFaaS are based on Kubernetes; therefore, Prometheus can be used to extract metrics from them. But both platforms have different ways of calculating the same metrics and putting them under different names. The below image depicts the different monitoring names for different serverless platforms.

Different metrics names for serverless platforms, with the unified names shown under FDN-Monitor Name.

Solution:

For collecting various metrics from multiple serverless platforms and unifying them together, we have built a client-based tool called FDN-Monitor.

FDN stands for Function Delivery Network (a network of multiple serverless compute clusters).

FDN-Monitor is responsible for gathering monitoring data metrics under the following three categories:

User-Centric metrics: This category corresponds to metrics that a user can observe. The 90-percentile (P90) response time of requests and the number of successful and failed invocations served per unit of time, are calculated as part of this class of metrics.
Platform-Centric metrics: Here, the metrics corresponding to the serverless compute platform are collected. These are the number of function invocations resulting from the received requests, the number of function instances, the maximum number of concurrent instances allowed for processing events, the number of cold starts, the execution time of the function (excluding the startup latency), and the memory allocated to each function instance. These metrics differ from platform to platform in name and also in number. For platforms hosted on Kubernetes, we can collect the function’s resource consumption metrics such as CPU, memory, Disk I/O, and network usage.
Infrastructure-Centric metrics: Here, the metrics from the host machines in the cluster are collected. Therefore, this category only exists for clusters hosted on edge or on-premise. The amount and usage over time of static resources, such as the number of cores, memory usage, Disk I/O, and network usage of individual nodes within a cluster, are collected under this category.

The above table shows the summary of the monitoring metrics from Platform-Centric and Infrastructure-Centric categories from all the four serverless platforms considered, along with the name used by the FDN-Monitor.

Overall Deployment Architecture:

FDN-Monitor acts as a sidecar, with every virtual-kubelet node in FDN representing a serverless compute cluster, to collect various metrics from the clusters. However, it can also be independently used without deploying it as part of FDN.

Each FDN-Monitor agent is attached as a sidecar container to virtual-kubelet node of each cluster. Based on this, they pull the metrics from the underneath cluster every 30 seconds (DEFAULT_LOGGING_PERIOD) and aggregate them into InfluxDB. Grafana queries the data from InfluxDB and showcases them in various dashboards.

We deploy InfluxDB and Grafana as docker containers.
FDN-Monitor for private cluster deployment uses two Prometheus deployments (one for the Serverless platform and the other for the K8s cluster) to get metrics.
All the dashboards for FDN are configured and present in grafana-provisioning.

Getting started:

All the code related to FDN-Monitor is present in Monitor Directory.

1. Modifying InfluxDB and Grafana credentials:

Update .env file according to the following parameters:

INFLUXDB_USERNAME=admin
INFLUXDB_PASSWORD=secure_influx_admin_password_fdn
INFLUXDB_ORG=fdn_org
INFLUXDB_BUCKET=fdn_monitoring_bucket
INFLUXDB_ADMIN_TOKEN=a_secure_admin_token_for_admin_fdn
V1_DB_NAME=fdn_monitoring_db
V1_RP_NAME=v1-rp
V1_AUTH_USERNAME=admin
V1_AUTH_PASSWORD=secure_influx_admin_password_fdn

GRAFANA_USERNAME=admin
GRAFANA_PASSWORD=secure_influx_admin_password_fdn

2. Building FDN-Monitor (optional):

```functiondeliverynetwork/fdn-monitor``` Image is used when deploying.

Clone the repository and go to the root directory:

git clone https://github.com/Function-Delivery-Network/FDN-Monitor.git
cd FDN-Monitor

Make the changes in app folder
Once changes are made, create a docker image and push it to Docker-Hub.

cd FDN-Monitor/app
sudo docker build -t functiondeliverynetwork/fdn-monitor .
sudo docker push functiondeliverynetwork/fdn-monitor

3. Serverless compute platform/cluster parameters configuration:

Rename config_*.env to config.env file according to the cluster which you want to monitor. Example files are provided for each serverless platform. For example, for OpenFaaS:

CLUSTER_TYPE=OPENFAAS
CLUSTER_NAME=edge_cluster
CLUSTER_AUTH=
CLUSTER_HOST=
CLUSTER_USERNAME=admin
CLUSTER_API_GW_ACCESS_TOKEN=hello
CLUSTER_GATEWAY_PORT=31112
CLUSTER_SERVERLESS_PLATFROM_PROMETHEUS_PORT=30008
CLUSTER_KUBERNETES_PROMETHEUS_PORT=30009
DEFAULT_LOGGING_PERIOD=30
POWER_COLLECTION="True"
INFLUXDB_HOST=influxdb
INFLUXDB_PORT=8086
INFLUXDB_ORG=fdn_org
INFLUXDB_BUCKET=fdn_monitoring_bucket
INFLUXDB_ADMIN_TOKEN=a_secure_admin_token_for_admin_fdn
INFLUXDB_TABLE_INFRA=infra_usage_table
INFLUXDB_TABLE_FUNCTIONS=functions_usage_table

4. Deploying

sudo docker-compose -f docker-compose-fdn.yml -d

Grafana Dashboards

Overall System-Infra metrics dashboard

Multiple Platforms Functions combined dashboard.

Important Links:

Please post your thoughts, and if something needs to be changed or added, please do let me know.

If you would like to know details about how virtual-kubelet is used for managing multiple serverless compute platforms or how FDN is implemented, please let me know. I will create separate blogs.

You can reach out to me on LinkedIn.

Check out my website for more details.

I look forward to your feedback!