Powerful SaaS solution for detection metrics

Richard Holly
6 min readFeb 28, 2020

--

If you are preparing SaaS solution in which you want to provide your customers “time series” reports and at the same time you want to make your solution well scalable, easily manageable at an affordable price — the following article introduces a possible solution, based on technology stack:

Why …?

Collection and presentation of time-based metrics is an important topic in areas such as IOT or machine-learning. When preparing solutions where data is presented simply and effectively, in addition to being a SaaS service, there can be a lot of pitfalls.One of them is to find the right balance between performance, security and price.

For cloud-based solutions, there are many services such as Google stackdriver, Influxdb, … which have “start free” packages for metrics from automated detectors, for simple cases, these packages did not suit for my case because of following:

  • In detection system, evaluation takes place several times per second and if we evaluate several input streams, the number of values​/points increases significantly. We cannot afford to send metrics for example only once every 5 minutes to fit into the subscription
  • Custom metrics (series) — the problem here is the subscription model of some services, for example stackdriver is for a larger amount of custom metrics charged — based on the data rate.

I’m not going to explain all the other different product combinations, this is for a separate article — TBD. An important factor (though not the most important) for me was the possibility of local deployment. After a long search and comparison of different products, I found a pragmatic choice for my solution:

  • Auth0 — identity management and Oauth provider
  • Grafana — data presentation tool
  • VictoriaMetrics — Prometheus compatible, powerful storage for metrics

This combination of software products fulfilled all my expectations of “start small — think big”.

Prometheus Oauth Rewrite Proxy — “VictoriaMetrics SaaS Enabler”

The chosen solution allows the following flow

  1. Through Auth0 you can manage tenants (users)
  2. They can log in through Oauth integration into Grafana
  3. They can select and view data from a data source
  4. For the data source, Grafana has the option of forwarding Oauth towards the data source

In point 4. it all goes down to the fact that in case of SaaS solution we want every logged user to see only his tenant data and VictoriaMetrics do not support Oauth. As one solution it is possible to create data sources for each tenant — which I do not like, especially the idea that those tenants could be thousands.

Let me introduce the Prometheus Oauth rewrite proxy, …

which we deploy between the data source and the Grafana. It ensures that all queries that come to the data source from the Grafana are enriched with a label with a tenant identifier. This identifier is dynamically detected by the proxy from the Oauth provider using a forwarded token from Grafana. The returned data are automatically filtered so that the user can only see his data, even when viewing common metrics.

My solution is under the MIT license available in the repository: https://gitlab.com/optima_public/prometheus_oauth_proxy

Proxy supports the following handlers from Prometheus querying API:

e.g. When it comes to query by tenant with id 1999

http_requests_total{job="prometheus",code="200"}

It is rewritten to

http_requests_total{job="prometheus",code="200",client_id="1999"}

label “client_id” which configurable of course.

99.9% of the full Promql syntax is supported using python Lark parser.

Note: I was totally challenged to prepare a python solution, as the original parser was created in GO language and I didn’t find any other suitable solution for my needs.

Complex query formats are also supported, e.g. query by tenant with id 1999

max_over_time(deriv(rate(distance_covered_total[5s])[30s:5s])[10m:])and http_requests_total{status_code=~"2.*"}[50h]

It is rewritten to

max_over_time(deriv(rate(distance_covered_total{client_id="1999"}[5s])[30s:5s])[10m:])and http_requests_total{status_code=~"2.*",client_id="1999"}[50h]

For a better idea of ​​supported (but also disabled) query formats, see /tests/test_parser/test_parser.py

Benefits

Benefits of applying this solution include:

  • A common data source that we can scale, upgrade, and back up more easily
  • Common tenant metrics (we don’t need to create specific metrics for tenants) make it easier to collect and write
  • Common dashboards (since queries are rewritten dynamically), so we can easily prepare them for tenants

All the benefits mentioned above results in Simplified tenant (user) setup and better maintenance.

Pre — flight checklist

  1. to the same Oauth domain as the Grafana
  2. has the same OAUTH_CUSTOMER_ID_CLAIM as we set for Auth0
  3. has the same PROMETHEUS_CUSTOMER_ID_TAG as we use to write metrics
  • our writer, which writes metrics, must label each metric with PROMETHEUS_CUSTOMER_ID_TAG, the same value that is automatically populated by a proxy (e.g. see /tests/data_simulation.py)

Docker compose example

For better understanding and local debugging of the configuration a comprehensive deployment scenario with help of docker-compose is prepared for you.

Located in directory

https://gitlab.com/optima_public/prometheus_oauth_proxy/-/tree/master/tests/docker_compose

Shell script runner.sh, which refers to the official docker builds with tags placed on the

https://gitlab.com/optima_public/prometheus_oauth_proxy/-/tags

Or we can make local changes, and then use runner.local-build.sh script.

The basic configuration is as follows:

  1. We will create local directories for VictoriaMetrics data and Grafana data. These directories are for data persistence. We will use those directory names in runner.sh file as ENV variables.
  2. Modify docker-compose.yaml if a specific version of Grafana is needed, or change ports. …
  3. The other ENV parameters of Grafana are located in the config file. In principle, it is mainly the password admin USER and OAUTH configuration parameters. Ref. also https://grafana.com/docs/grafana/latest/auth/generic-oauth/ — Set up OAuth2 with Auth0
  4. The value GF_SERVER_ROOT_URL represents the address where we want our deployment to be available (e.g. localhost), but this must be set on the Auth0 side because of Oauth redirect — user is redirected to this domain after login.
  5. Set Auth0 provider according to the Grafana configuration procedure (https://grafana.com/docs/grafana/latest/auth/generic-oauth/ — Set up OAuth2 with Auth0 ) , or we can use different provider
  6. The provisioning directory is an example of how to automatically deploy a custom data source (in our case pointing to our proxy) and how to automatically deploy a pre-built dashboards.
  7. The ENV parameters for the proxy are listed in the config.prometheus_oauth_proxy file. Their meaning is as follows:
PROMETHEUS_DATASOURCE: Address of the original (victoria-metrics) data source http(s)://, In our example ref. docker-compose — service: victoriametricsPROMETHEUS_CUSTOMER_ID_TAG: This label is automatically added to each series (vector) in the queryOAUTH_AUTH0_DOMAIN: Oauth domain, from which userinfo is obtained https://_OAUTH_AUTH0_DOMAIN/userinfoOAUTH_CUSTOMER_ID_CLAIM: Userinfo looks for this parameter that uniquely identifies the tenant and its value is used in PROMETHEUS_CUSTOMER_ID_TAGDEBUG: “True” — more detailed output to the terminalPYTHONUNBUFFERED: Set to 0, the solution for immediate output to the terminal in the docker compose

8 . Run runner.sh, or runner.local-build.sh

9. Open Grafana in browser at GF_SERVER_ROOT_URL and log in (locally as admin or Sign in with Auth0)

10. Take a look at pre-defined dashboard in folder sample_dashboards

11. To test the simulated data, we can use data_simulation.py script — https://gitlab.com/optima_public/prometheus_oauth_proxy/-/blob/master/tests/data_simulation.py in which modify METRICS_SERVER_URL and fill the Victoria-metrics sample data. This script is an example of how it is possible to send multiple metrics in one call. Ref. also https://github.com/VictoriaMetrics/VictoriaMetrics#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf

Helm chart — prometheus_oauth_proxy

One of the project’s outputs is the helm chart, for easier Kubernetes deployment. There is repository with Prometheus Oauth proxy helm chart.

Add a chart helm repository

helm repo add optima_public https://optima_public.gitlab.io/helm_repo/

Test the helm repository

helm search prometheus_oauth_proxy

The command must display existing helm chart e.g.

NAME CHART VERSION APP VERSION DESCRIPTIONoptima_public/prometheus_oauth_proxy 0.1.2 0.1.2 A Helm chart for Kubernetes

Installing the chart

helm install -n prometheus_oauth_proxy optima_public/prometheus_oauth_proxy

--

--