Cypress + Prometheus. Increasing observability possibilities on e2e tests

Guilherme Siani
Blog Técnico QuintoAndar
5 min readMay 12, 2022

Automated test using Cypress generates a JSON file that includes test status, duration, warnings, and failure path screenshots. Here at QuintoAndar we upload their reports to a bucket on S3 and map them to a database, where we can make queries and monitor our test code health. Although the result presents relevant data, the analysis is done actively and it is not so easy to compare with previous executions. On this architecture we cannot create real-time alarms since the mapping from S3 to the database is done for D-1. Creating real-time alerts would be helpful to application owners immediately identify errors on their end-to-end tests.

Problem

We want real-time metrics from our end-to-end tests that run on homologation or production environments. These metrics allow creating dashboards on Grafana and alarms to notify who wants to receive test result information in real-time, for example, a failing test that will possibly generate a bug for clients.

Study

To understand the need, we consider some ways to generate metrics and alarms:

  • Add a notification layer on the Cypress service to generate alarms.
  • Interpretation of test results from data of a JSON format and persist the metrics on Prometheus of each environment.
  • Configuration of Prometheus client on Cypress service to allow the custom metrics being generated from each test execution.

Considerations

We adopt the Prometheus configuration on Cypress automated test service directly.

Pros

  • Easy to generate custom metrics
  • Generation of metrics by each test
  • Allow creating custom dashboards on Grafana, one time that our environments already have Prometheus data access
  • Already have integration of Prometheus and Grafana to configure alarms in our stack

Cons

  • The solution does not work without a Pushgateway client allowing metrics generation on Prometheus.
    The automated e2e tests are executed from a cronjob on Kubernetes.

Difficulties

  • At the first moment, we can’t expose the metrics on a server to be consumed by Prometheus in Homologation and Production environment. It was needed to use the Pushgateway tool to take the metrics to Prometheus.

Test

On Automated Test E2E (Cypress)

  • We installed all the Prometheus + Pushgateway Stack on the project with docker containers and automated with docker-compose for the local test environment.
    Prometheus Stack Repository: https://github.com/vegasbrianc/prometheus
    This allowed the test to be executed in the local environment only by changing the Host of Pushgateway to the container.
  • To allow the communication with Pushgateway in Homologation and Production environment we add an Env Variable called PUSHGATEWAY_HOST on Kubernetes pods and then it is considered on each environment.
  • We use the Cypress middleware to generate a metric for each test and then we validate the data persisted on Prometheus by Pushgateway of the environment.

Infrastructure

  • Here is the documentation of all the steps to up and running the solution with Pushgateway and the realized tests sending metrics of the Automated Test (Cypress) service.

How Prometheus works on QuintoAndar:

The Prometheus Operator was implemented with the Helm Chart kube-prometheus-stack of prometheus-community.

Configuring the PushGateway on QuintoAndar:

We follow using the Helm Charts of prometheus-community, the PushGateway.

The chart integration is from serviceMonitor responsibility. When activated we allow the Pushgateway to communicate with the Prometheus Operator.

# Enable this if you're using <https://github.com/coreos/prometheus-operator>
serviceMonitor:
enabled: true
namespace: monitoring
With serviceMonitor enable on the chart, the Pushgateway appears as a target on Prometheus.
Resource view from ArgoCD.

IMPORTANT: Local validation of chart:

The local test was executed within the solution building. It was possible to know how Yamls would be generated and then we define a strategy of implementation/integration with our Prometheus.

helm repo add prometheus-community <https://prometheus-community.github.io/helm-charts>
helm search repo prometheus-community
helm template prometheus-community/prometheus-pushgateway -f values.yaml

In this way, we validate the config chart until it stayed without structural errors and fits our needs.

Send of test metrics and validation:

  • Using port-forward and curl

After the PushGateway is deployed, we follow the official solution documentation and send test metrics to validate the stability of the PushGateway and the scrape released by the Prometheus.

Ps. To run the test we use the port-forward to have access the pod.

kubectl port-forward --address 192.168.0.1
\\ $(kubectl get pods -o=name -n monitoring | grep pushgateway)
\\ 9091:9091 -n monitoring
echo "some_metric 3.14" | curl --data-binary @- <http://192.168.0.1:9091/metrics/job/some_job>

Removing the test metric:

curl -X DELETE http://192.168.0.1:9091:9091/metrics/job/some_job
  • Using the Nginx pod on the homologation cluster

We release an Nginx pod on cluster and namespace of homologation, where the applications that use the solution are running. An important point is both Prometheus and PushGateway are being executed on the same namespace of monitoring.

kubectl create deployment nginx-test --image=nginx -n homolog
kubectl get pods -n homolog | grep nginx-test

Accessing the pod and pushing a test metric.

kubectl exec -it pod/nginx-test-84fffd4997-xdqxk -n homolog -- shecho "some_metric 3.14" | 
\\ curl --data-binary @- <http://prometheus-pushgateway.monitoring.svc.cluster.local:9091/metrics/job/some_job>

Removing all the test structure:

curl -X DELETE <http://prometheus-pushgateway.monitoring.svc.cluster.local:9091/metrics/job/some_job>
kubectl delete deploy/nginx-test -n homolog

Result ✅

Manual execution of the cronjobs that were configurated (automated-e2e-hello-world):

kubectl create job teste-2 --from=cronjob/automated-e2e-hello-world -n homolog

All the metrics in local and Homologation environments were generated correctly from test execution. See the evidence below:

Pushgateway

Prometheus

Thanks

I cannot forget to mention that none of these would be possible without the help of our amazing people! Thanks, Daniel Mariotti, Caroline Vieira, André Francisco and Natália Alarcon.

References

--

--