FastAPI Microservice Patterns: Application Monitoring

Application metric monitoring with Prometheus and Grafana

Florian Kromer
Dec 27, 2020 · 6 min read

What’s the problem?

During development and maintenance of microservices things go wrong. In many situations the problems root cause resides in the application code. It’s essential to have insight into the internals of the application. But how to enable observability of numerical metric values to ease understanding applications and troubleshoot problems?

Solution alternatives

There are several layers w.r.t. service monitoring. Monitoring the container orchestration platform, the service logic, etc. And there are different types of data to gather w.r.t. observability (numerical values e.g. the time spent during execution of an endpoint, log strings, etc.). The monitoring of the different levels and different types complement each other instead of being alternative approaches. To enable observability via logs the log aggregation pattern can be used. To enable observability of timing across several services one uses the distributed tracing pattern. This post is about the service logic level w.r.t. gathering numerical values only.

One solution: Application monitoring

When applying the application metrics pattern the application code is instrumented to gather metric values and collect them in a central place by either pushing it from a microservice to a metrics service or by letting a metrics service pull it from a microservice.

Pattern implementation

This example instruments the pattern by using Prometheus to implement the metrics service code. Prometheus can collect different metric types:

  • Counter: “A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.”
  • Gauge: “A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.”
  • Histogram: “A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.”
  • Summary: “Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.”

Instead of using the Prometheus Python Client to instrument the service code one can use one of several higher level libraries. At the time of writing there are prometheus-fastapi-instrumentator, starlette-prometheus or starlette_exporter. The different libraries export different metrics. Starlette is the ASGI framework FastAPI is built on top of. This implies that the libraries starlette-prometheus and starlettte_exporter collect ASGI specific metrics (e.g. information w.r.t. HTTP requests) only. In case one would need other communication related information like e.g. messaging meta-data (How often have messages on a specific topic been processed? etc.) one would have to add this explicitly. This example implementation uses starlette-prometheus. Time will show which starlette-focused library will become the de-facto standard.

The example makes provides metrics about the number of times a decorated function was called and the total amount of time spent in a decorated function. The gathered information without some way to read it is of no value. The “dashboard framework for observability” Grafana is used to explore the metrics. Grafana allows to create dashboards to visualize the metrics. This topic is not part of this post. There are a lot of great resources about how to create dashboards available online.

The post FastAPI Microservice Patterns: Local Development Environment — Skaffold, docker, kubectl and minikube in a nutshell describes how to setup the local development environment and how to get the source code of the pattern implementations. The sub directory of the source code repository contains the example implementation of the pattern.

Run the microservice, Prometheus and Grafana with followed by .

Skaffold deploys Prometheus (via ), Grafana (via ) and a minimalistic “hello world” microservice. Prometheus is accessible via , Grafana is accessible via and the microservice via .

The microservice is instrumented using to provide it’s metrics via endpoint in :

...from starlette_prometheus import metrics, PrometheusMiddleware
app = FastAPI()
app.add_middleware(PrometheusMiddleware)
app.add_route(“/metrics”, metrics)
...

The metrics are observable via :

Image for post
Image for post
localhost:9000/metrics

At the time of writing starlette-prometheus supports the following metrics:

  • starlette_requests_total: Total count of requests by method and path.
  • starlette_responses_total: Total count of responses by method, path and status codes.
  • starlette_requests_processing_time_seconds: Histogram of requests processing time by path (in seconds).
  • starlette_exceptions_total: Total count of exceptions raised by path and exception type.
  • starlette_requests_in_progress: Gauge of requests by method and path currently being processed.

In addition the builtin metrics of the Prometheus Python client are provided.

Garbage collection related metrics are defined in :

  • python_gc_objects_collected: Objects collected during GC.
  • python_gc_objects_uncollectable: Uncollectable object found during GC.
  • python_gc_collections: Number of times this generation was collected.

Platform related metrics are defined in :

  • python_info: Python platform information.

Process related metrics are defined in :

  • virtual_memory_bytes: Virtual memory size in bytes.
  • resident_memory_bytes: Resident memory size in bytes.
  • start_time_seconds: Start time of the process since unix epoch in seconds.
  • cpu_seconds_total: Total user and system CPU time spent in seconds.
  • max_fds: Maximum number of open file descriptors.
  • open_fds: Number of open file descriptors.

Prometheus is configured via a config map () to gather the metrics provided of periodically:

...prometheus.yml: |-
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
# — “first.rules”
# — “second.rules”
scrape_configs:
- job_name: service-a
static_configs:
- targets: [‘service-a:80’]

Prometheus should show in the monitored endpoint list with a state “UP”:

Image for post
Image for post
localhost:9090/targets

Instead of using Prometheus for visualization of metrics we will use Grafana. Visit the Grafana landing page (you will be redirected to ), enter as username and password, press the “Log in” button.

Image for post
Image for post
localhost:3000/login

Change the password. In the current state of the example code the password needs to be changed whenever re-deploys the setup.

Image for post
Image for post
localhost:3000/login (set new password)

In Grafana add Prometheus as data source. Press button “Data Sources”.

Image for post
Image for post
localhost:3000

Press button “Add data source”:

Image for post
Image for post
localhost:3000/datasources

Enter “prometheus” into the search field, hover over the Prometheus entry and press button “Select”.

Image for post
Image for post
localhost:3000/datasources/new (search)

Configure the Prometheus data source by inserting “prometheus:9090” into the URL field.

Image for post
Image for post
localhost:3000 (data source configuration 1/2)

Press button “Save & Test”.

Image for post
Image for post
localhost:3000 (data source configuration 2/2)

Go the the “explore” view.

Image for post
Image for post

The metrics related to starlette-prometheus and client_python are contained in the namespaces “process”, “python” (garbage collector and python metric) and “starlette”.

Image for post
Image for post
localhost:3000 (process metrics)
Image for post
Image for post
localhost:3000 (python metrics)
Image for post
Image for post
localhost:3000 (starlette metrics)

If you select one of these metrics a corresponding meta-data is shown in a x-y-plot over time (here for ):

Image for post
Image for post
localhost:3000 (metric: starlette_requests_created)

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Florian Kromer

Written by

Software Developer for rapid prototype or high quality software with interest in distributed systems and high performance on premise server applications.

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Florian Kromer

Written by

Software Developer for rapid prototype or high quality software with interest in distributed systems and high performance on premise server applications.

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store