FastAPI Microservice Patterns: Application Monitoring
What’s the problem?
During development and maintenance of microservices things go wrong. In many situations the problems root cause resides in the application code. It’s essential to have insight into the internals of the application. But how to enable observability of numerical metric values to ease understanding applications and troubleshoot problems?
There are several layers w.r.t. service monitoring. Monitoring the container orchestration platform, the service logic, etc. And there are different types of data to gather w.r.t. observability (numerical values e.g. the time spent during execution of an endpoint, log strings, etc.). The monitoring of the different levels and different types complement each other instead of being alternative approaches. To enable observability via logs the log aggregation pattern can be used. To enable observability of timing across several services one uses the distributed tracing pattern. This post is about the service logic level w.r.t. gathering numerical values only.
One solution: Application monitoring
When applying the application metrics pattern the application code is instrumented to gather metric values and collect them in a central place by either pushing it from a microservice to a metrics service or by letting a metrics service pull it from a microservice.
- Counter: “A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.”
- Gauge: “A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.”
- Histogram: “A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.”
- Summary: “Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.”
Instead of using the Prometheus Python Client to instrument the service code one can use one of several higher level libraries. At the time of writing there are prometheus-fastapi-instrumentator, starlette-prometheus or starlette_exporter. The different libraries export different metrics. Starlette is the ASGI framework FastAPI is built on top of. This implies that the libraries starlette-prometheus and starlettte_exporter collect ASGI specific metrics (e.g. information w.r.t. HTTP requests) only. In case one would need other communication related information like e.g. messaging meta-data (How often have messages on a specific topic been processed? etc.) one would have to add this explicitly. This example implementation uses starlette-prometheus. Time will show which starlette-focused library will become the de-facto standard.
The example makes provides metrics about the number of times a decorated function was called and the total amount of time spent in a decorated function. The gathered information without some way to read it is of no value. The “dashboard framework for observability” Grafana is used to explore the metrics. Grafana allows to create dashboards to visualize the metrics. This topic is not part of this post. There are a lot of great resources about how to create dashboards available online.
The post FastAPI Microservice Patterns: Local Development Environment — Skaffold, docker, kubectl and minikube in a nutshell describes how to setup the local development environment and how to get the source code of the pattern implementations. The sub directory
application-monitoring of the source code repository contains the example implementation of the pattern.
Run the microservice, Prometheus and Grafana with
minikube start followed by
skaffold dev --port-forward.
Skaffold deploys Prometheus (via
prometheus/k8s/deployment.yaml), Grafana (via
grafana/k8s/deployment.yaml) and a minimalistic “hello world” microservice. Prometheus is accessible via
localhost:9090 , Grafana is accessible via
localhost:3000 and the microservice via
The microservice is instrumented using
starlette_prometheus to provide it’s metrics via endpoint
...from starlette_prometheus import metrics, PrometheusMiddleware
app = FastAPI()
The metrics are observable via
- starlette_requests_total: Total count of requests by method and path.
- starlette_responses_total: Total count of responses by method, path and status codes.
- starlette_requests_processing_time_seconds: Histogram of requests processing time by path (in seconds).
- starlette_exceptions_total: Total count of exceptions raised by path and exception type.
- starlette_requests_in_progress: Gauge of requests by method and path currently being processed.
In addition the builtin metrics of the Prometheus Python client are provided.
Garbage collection related metrics are defined in
- python_gc_objects_collected: Objects collected during GC.
- python_gc_objects_uncollectable: Uncollectable object found during GC.
- python_gc_collections: Number of times this generation was collected.
Platform related metrics are defined in
- python_info: Python platform information.
Process related metrics are defined in
- virtual_memory_bytes: Virtual memory size in bytes.
- resident_memory_bytes: Resident memory size in bytes.
- start_time_seconds: Start time of the process since unix epoch in seconds.
- cpu_seconds_total: Total user and system CPU time spent in seconds.
- max_fds: Maximum number of open file descriptors.
- open_fds: Number of open file descriptors.
Prometheus is configured via a config map (
prometheus/k8s/config-map.yaml) to gather the metrics provided of
evaluation_interval: 15s rule_files:
# — “first.rules”
# — “second.rules” scrape_configs:
- job_name: service-a
- targets: [‘service-a:80’]
Prometheus should show
service-a in the monitored endpoint list with a state “UP”:
Instead of using Prometheus for visualization of metrics we will use Grafana. Visit the Grafana landing page
localhost:3000 (you will be redirected to
admin as username and password, press the “Log in” button.
Change the password. In the current state of the example code the password needs to be changed whenever
skaffold re-deploys the setup.
In Grafana add Prometheus as data source. Press button “Data Sources”.
Press button “Add data source”:
Enter “prometheus” into the search field, hover over the Prometheus entry and press button “Select”.
Configure the Prometheus data source by inserting “prometheus:9090” into the URL field.
Press button “Save & Test”.
Go the the “explore” view.
The metrics related to starlette-prometheus and client_python are contained in the namespaces “process”, “python” (garbage collector and python metric) and “starlette”.
If you select one of these metrics a corresponding meta-data is shown in a x-y-plot over time (here for
You are ready to create dashboards with these metrics now.
The post series content
- FastAPI Microservice Patterns
- FastAPI Microservice Patterns: Local Development Environment
- FastAPI Microservice Patterns: Service discovery in container orchestration platforms
- FastAPI Microservice Patterns: Asynchronnous communication
- FastAPI Microservice Patterns: Application monitoring
- FastAPI Microservice Patterns: Serverless Deployment
- FastAPI Microservice Patterns: Externalized configuration