Announcing Prometheus Metrics from Fn

The Fn server has support for Prometheus metrics built-in. This allows you to obtain information about its performance and resource consumption and make it available to other systems to display and analyse.

NB: this post was updated on 11 Jan 2018 to reflect changes to a number of Prometheus metric names.

In this article we’ll find out about Prometheus and what kinds of information the Fn server makes available to it. We’ll briefly look at how it works before seeing how you can use Prometheus in conjunction with Grafana to display dynamic graphs that show how your Fn server is performing.

Grafana dashboard showing some Fn server metrics

Getting setup

  • Docker 17.05 or later should be installed and running
  • Install the fn command-line tool. There are instructions in the README.
  • Clone the GitHub repo fnproject/fn to get the examples you need: install go from golang.org, set $GOPATHto the directory where you want to place the clone, and then type go get github.com/fnproject/fn . This will create a directory $GOPATH/src/github.com/fnproject/fn. We’ll assume this location later, so if you perform the clone using git you’ll need to adjust the instructions appropriately.

What is Prometheus?

Prometheus is an open-source systems monitoring system. Essentially it’s a server process which scrapes metrics data from one or more Fn servers and makes it available via a powerful query API.

Whilst the Prometheus server has its own web interface, which can be used to display tables and graphs of metrics data, it is usually used in conjunction with a separate tool such as Grafana that displays multiple metrics and can be used to generate alerts.

What metrics are available?

Currently Fn provides three sets of Prometheus metrics. We’ll see all three sets of metrics in action later.

  • Function counts: the number of functions that are currently queued or are running, and the number of functions that have completed successfully or failed since the server was last started.
  • Operation durations: these represent the time taken to perform various operations. In addition to obtaining the time taken to execute a function, you can also obtain finer-grained data such as the time taken to start the docker container in which it runs.
  • Docker metrics: when Fn executes a function in a Docker container it obtains various statistics from that container, such as CPU and memory usage, and makes them available as Prometheus metrics.

How does Prometheus work?

When we say that Fn provides Prometheus metrics, what this actually means is that it makes metrics available to the Prometheus server using a special API endpoint.

The Fn metrics endpoint for Prometheus is http://localhost:8080/metrics by default. Although this endpoint is intended for use only by the Prometheus server, you can invoke it yourself to see how it works.

Try out the Fn metrics endpoint

All you need is an Fn server running at this stage. Start a Fn server:

fn start

Now create and deploy a function. Here we simply deploy a simple sync function, but you can use any functions you like.

First use the fn CLI tool to create a simple boilerplate function:

mkdir myfunc
cd myfunc
fn init --name myfunc --runtime go

This creates a file func.go which simply returns “Hello World!. Modify this to add a short sleep, to make it a bit more realistic. The lines you need to add are marked with // <---- add this below

Now build and deploy the function:

fn deploy --local --app myapp

Invoke the function a few times:

curl localhost:8080/r/myapp/myfunc
curl localhost:8080/r/myapp/myfunc
curl localhost:8080/r/myapp/myfunc
curl localhost:8080/r/myapp/myfunc

Now type the following to see what is returned by the Prometheus metrics endpoint.

curl  --silent http://localhost:8080/metrics | grep fn_completed

This will return something like

# HELP fn_completed Metric fn_completed
# TYPE fn_completed counter
fn_completed{fn_appname="myapp",fn_path="/myfunc"} 4

The endpoint returns a long list of metric names and their current values, so we used grep to display only the values of the metric fn_completed.

This particular metric gives you the number of times a function has run successfully since the server was last started. fn_appname and fn_path are labels which provide more information about a particular metric value. Almost all the metrics provided by Fn are given labels to specify the application and the route (path) of the function.

Introducing Prometheus

So far we’ve just looked at the metrics generated by the Fn server. What does the Prometheus server itself do?

It scrapes (polls) the Fn metrics endpoint at periodic intervals and save the values of each metric in a database. If you’re running multiple Fn servers you can configure Prometheus to scrape them all in turn and combine the data together.

Prometheus has a powerful API and query syntax which can be used to obtain values of these metrics. You can obtain historical values of a metric, suitable for displaying on a graph, or you can perform statistical operations such as summing metric values across multiple labels, calculating rates and performing quantile functions.

The Prometheus server has a web interface which allows you to see what metrics are available, and try out ad-hoc queries.

However for day-to-day use the power of Prometheus is best realised when it is used in conjunction with some other tool that fetches data from Prometheus and displays the metrics you need in some kind of graphical dashboard. One such tool is Grafana. We’ll look at that in a moment.

Try out Prometheus

Let’s first start a Prometheus server. You can do this very easily in docker:

docker run --name=prometheus -d -p 9090:9090 -v ${GOPATH}/src/github.com/fnproject/fn/examples/grafana/prometheus.yml:/etc/prometheus/prometheus.yml --link fnserver prom/prometheus 
  1. The -v parameter is used to specify a Prometheus confguration file provided with one of the Fn examples. If you look at ${GOPATH}/src/github.com/fnproject/fn/examples/grafana/prometheus.yml you will see that it tells Prometheus to scrape metrics from fnserver:8080
  2. The hostname fnserveris an alias, configured using the parameter --link fnserver to refer to the running Fn server. This requires the Fn server to be running in docker.

Grafana

Now let’s have a look at Grafana. This is another server process, also open source. It has a web interface that you can use to create customised dashboards that pull data from Prometheus (using the full power of the Prometheus query API to perform statistical transformations) and display it graphically to the user.

Fn comes with a number of pre-defined dashboards which showcase all the various metrics provided by the Fn server.

Dashboard 1: Function Count Metrics

This dashboard demonstrates the first type of Prometheus metric provided by Fn server: function counts.

Start Grafana on port 3000:

docker run --name=grafana -d -p 3000:3000 --link prometheus grafana/grafana

Open a browser on Grafana at http://localhost:3000.

Login using the default user admin and default password admin.

Create a datasource to obtain metrics from Prometheus:

  • Click on Add data source. In the form that opens:
  • Set Name to PromDS (or whatever name you choose)
  • Set Type to Prometheus
  • Set URL to http://prometheus:9090
  • Set Access to proxy
  • Click Add and then Save and test

Now import the example dashboard that displays metrics from the Fn server:

  • Click on the main menu at the top left and choose Dashboards and then Import
  • In the dialog that opens, click Upload .json file, navigate to $GOPATH/src/github.com/fnproject/fn/examples/grafana and select fn_grafana_dashboard.json.
  • Specify the Prometheus data source that you just created
  • Click Import

You should then see the following dashboard. Now invoke your function a few more times and see the graphs update.

Function counts: queued, running, completed, failed

Try invoking your function repeatedly using the following simple script. Paste it into an empty file run.bash and then type bash run.bash. Let it run whilst you watch the graphs update.

This dashboard shows the four “function count” metrics. From left to right, these are:

  • fn_queued number of async function calls that are queued
  • fn_running number of function calls that are currently running
  • fn_completed number of function calls that have completed successfully
  • fn_failed number of function calls that have failed

For each type of metric, the dashboard displays two separate graphs, one above the other.

The graph at the bottom shows the raw data, which by default is subdivided by its app and route (path) labels. So you will see one line for each function.

The graph at the top shows the result of performing a Prometheus sum expression that adds together the metric values for each function. So the graph shows just one line, the grand total for all functions.

This demonstrates how there is more to using Prometheus than simply displaying a metric. It allows you to perform various statistical transformations on the raw data, performing a sum in this case.

Dashboard 2: Operation Duration Metrics

This dashboard demonstrates the second type of Prometheus metric provided by the Fn server: operation durations. These tell you how much time was taken to perform various internal operations such as executing a function.

These operations are actually tracing spans. Fn uses the Open Tracing API to create tracing spans for various important operations.

Each span represents a timed internal operation such as a function call, and has two main attributes: a name that describes the operation being performed (for example docker_wait_container), and its duration in seconds. Each span name is represented by a separate histogram metric, which has a name of the form fn_span_<span-name>_duration_seconds.

If you enable tracing in your Fn server (and we’re not going to discuss this further here) you can analyse these tracing spans using Zipkin. All you need to know is that these same tracing spans are also used to generate duration metrics for Prometheus.

If the span is associated with a specific function invocation, the corresponding metric is given the labels fn_app and fn_path which are set to the application name and function path respectively.

There are several ways in which these metrics can be used. In this dashboard you can display either a rolling mean value of the operation duration, or you can display the operation rate in operations per second.

To try out this dashboard,

  • Click on the main menu at the top left and choose Dashboards and then Import
  • In the dialog that opens, click Upload .json file, navigate to $GOPATH/src/github.com/fnproject/fn/examples/grafana and select fn_grafana_dashboard2.json.
  • Specify the Prometheus data source that you created earlier
  • Click Import
Operation durations

When you first display the dashboard you will see just two graphs. This is the initial, default setting. There are two pull-down lists at the top. Each allows you to choose which operations you wish to display as graphs. The one on the left allows you to choose which operations you wish to display as rates, whilst the one on the right allows you to choose which operations you wish to display as durations. Choose as many as you like.

The three most interesting metrics are

  • fn_span_agent_submit_duration_seconds
  • fn_span_agent_submit_app_duration_seconds
  • fn_span_agent_submit_global_duration_seconds

These operations all correspond to the time taken to execute a function. They represent the same operation, but the first has labels fn_appname and fn_path set to the application and route (path), the second has just the fn_appname label set, and the third has no labels. This makes it easier to construct Prometheus queries by removing the need to sum values across label values.

The following metrics relate to more internal operations. These all have the labels fn_appname and fn_path set to the application and route (path).

  • fn_span_agent_call_end_duration_seconds
  • fn_span_agent_call_start_duration_seconds
  • fn_span_agent_cold_exec_duration_seconds
  • fn_span_agent_get_slot_duration_seconds
  • fn_span_agent_attach_container_duration_seconds
  • fn_span_agent_collect_stats_duration_seconds
  • fn_span_agent_create_container_duration_seconds
  • fn_span_agent_inspect_container_duration_seconds
  • fn_span_agent_inspect_image_duration_seconds
  • fn_span_agent_remove_container_duration_seconds
  • fn_span_agent_start_container_duration_seconds
  • fn_span_agent_wait_container_duration_seconds
  • fn_span_ds_get_app_duration_seconds
  • fn_span_ds_get_route_duration_seconds
  • fn_span_ds_insert_app_duration_seconds
  • fn_span_ds_insert_call_duration_seconds
  • fn_span_ds_insert_log_duration_seconds
  • fn_span_ds_route_duration_seconds
  • fn_span_ds_update_route_duration_seconds
  • fn_span_mq_reserve_duration_seconds
  • fn_span_serve_http_duration_seconds

Dashboard 3: Docker Statistics Metrics

This last dashboard demonstrates the third type of Prometheus metric provided by Fn server: docker statistics.

To try out this dashboard,

  • Click on the main menu at the top left and choose Dashboards and then Import
  • In the dialog that opens, click Upload .json file, navigate to $GOPATH/src/github.com/fnproject/fn/examples/grafana and select fn_grafana_dashboard3.json.
  • Specify the Prometheus data source that you created earlier
  • Click Import
Docker statistics

Once again when you first display this dashboard it will display just one graph. Use the pull-down list at the top to select which particular metrics you desire.

Available metrics are

  • fn_docker_stats_cpu_kernel
  • fn_docker_stats_cpu_total
  • fn_docker_stats_cpu_user
  • fn_docker_stats_disk_read
  • fn_docker_stats_disk_write
  • fn_docker_stats_mem_limit
  • fn_docker_stats_mem_usage
  • fn_docker_stats_net_rx
  • fn_docker_stats_net_tx

As with the duration metrics these all have the labels fn_appname and fn_path set to the application and route (path).

If you try this with your own functions it is possible that you won’t see any metrics. This might be because your function is very quick then there is insufficient time for docker to obtain the statistics before the container in which the function was running is shut down. The Fn team has contemplated delaying the termination of the container until at least one set of statistics was obtained, but at present has decided not to do this.

Create your own dashboard

The three dashboards provided with Fn are intended to showcase the wide variety of Prometheus metrics currently available from the Fn server. Using Grafana it’s easy to create your own dashboards which display the data that is important to you.

Fn, Prometheus and Grafana

What’s Next

  • More metrics! We’re adding additional metrics to Fn all the time. For example we’ll be soon be adding metrics allow you to see how many running, stopped and paused Docker containers you have.
  • API extensions: Work is in progress on a new extension API which will provide the most useful statistics directly from the Fn server without the need to access Prometheus directly. See GitHub issue 514 for details.
  • If you build your own dashboard, share it! Add it alongside the existing examples and submit a pull request.

Questions?

  • Join the Fn community on Slack: http://slack.fnproject.io. You can contact the author there as well as the rest of the team.
  • Check out Fn Project on GitHub to see the other cool things we’re building. It’s open source, so we love contributions! We might even send you an Fn shirt!
Like what you read? Give Nigel Deakin a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.