Selenium: Clear as a Bell

Having years of experience with big Selenium clusters we in Aerokube team are trying to change Selenium testing world with really efficient tools free to use for everyone. You could have heard about our lightweight Selenium server replacement called Selenoid and an extremely efficient load balancer for big clusters called Ggr. If you already have a Selenium cluster or are planning to deploy one — an important question you should think about is: “How would I maintain such cluster?” or “Is it clear how to work with such cluster from systems administrator point of view?”. Today we are going to make your systems administrator happy with a Selenium cluster clear as a bell.

Clear Documentation

A short but sufficiently detailed documentation is an important part of any self-respecting software project. Selenium traditionally had rather poor, scrappy and very often outdated documentation stored on the wiki pages. Which is very annoying — this official documentation mainly covers usage aspects such as creating tests and does not provide a strong foundation about setting up the reliable infrastructure. Numerous articles about Selenium Grid continue to copy-paste the same commands. Tens of thousands of StackOverflow questions about correctly using CSS selectors are even more distancing your systems admin from making your development team happy. We did an effort to deliver a single point of truth about our tools. The following links lead to genuine documentation:

  • Selenoid — useful for software developers desiring to debug some browser tests on their computers and equally to systems administrator willing to dive into clusters maintenance aspects.
  • Selenoid UI — a short demonstration of the Selenoid UI, our standalone user interface for Selenoid.
  • Configuration Manager — more details about cm, a magic wand — doing all routine Selenoid installation and configuration work for you.
  • Ggr — everything you wish to know about Ggr.

Responsive Support

It is impossible to describe all software usage aspects even being an 80th level documentation expert. You should always be sure to quickly find answers to tricky or platform-specific questions. This is why a good open-source project should always have at least one responsive support channel. We are providing at least three official support channels:

  1. First of all the traditional email support for personal questions: support@aerokube.com.
  2. Then — faster Telegram support chat. We are proud to have almost 300 permanent members now in this chat. So most of newbie questions are answered almost immediately.
  3. Finally — we have StackOverflow tag and from time to time ask our users to post tricky questions there. You are welcome to use the best fitting channel if you have any questions. Let’s now move to the most interesting part — the practice.

Measuring Software Efficiency

Health API

Large scale software is often installed behind load balancers proxying requests only to healthy application instances. Every application should provide the way to determine whether it is alive. One possible approach for HTTP-based applications is to return health status via HTTP — by provide its health API. Our tools — Ggr and Selenoid follow this pattern and return their health status on /ping. A healthy instance always returns 200 and some additional information in JSON format:

$ curl -s -D- http://my-ggr-host.example.com:4444/ping
HTTP/1.1 200 OK
Date: Mon, 11 Dec 2017 03:36:39 GMT
Content-Length: 125
Content-Type: text/plain; charset=utf-8
{"uptime":"60h9m36.257828483s","lastReloadTime":"2017-12-08 18:27:03.632220529 +0300 MSK m=+0.009773971","numRequests":4082}

The JSON contains the overall instance uptime, last quota reload time and the overall number of HTTP requests processed since started. You can do just the same request for Selenoid.

Dealing with Logs

Now when you know how to easily find sick Selenoid and Ggr instances — let’s move to the logging stuff. Application logs are used by systems administrator every day. This is why to be efficient it is very important to have handy logging architecture. Modern application deployments nowadays tend to use centralized logging storage. This gives a single entry point to all logs as well as fast log searching capabilities.

With Selenoid it becomes a trivial task to use such modern approaches to upload browser session logs. Let’s for example set up Selenoid to upload such logs to well-known Elastic (EBK) Stack consisting of:

  • ElasticSearch — logs storage and search engine
  • Logstash — a daemon to process your logs
  • Kibana — cool user interface to view the logs

These daemons are usually installed together to one or several hosts and form a cluster. To start sending various data to cluster Elastic provides a collection of lightweight daemons called Beats. You can send not only log files but system metrics, network packet information, operating system events and so on.

For the purpose of sending logs data Elastic added Filebeat to the overall beats collection. This daemon just waits for new lines in specified log files and sends them either to ElasticSearch or to Logstash. Filebeat is also able to work with Docker container logs, i.e. watch for containers with specific name or image and send their logs to the specified destination.

Let’s for example configure a simple ELK installation on a remote machine and send logs to it. The simplest way to do this is to use Docker Compose. To ease network communication we will be using custom Docker network allowing to use human-readable hostnames instead of IP addresses. For the purpose of demonstration it is useful to have Docker volume as ElasticSearch storage — it is created on stack startup and completely removed on its shutdown. An example docker-compose.yml file can look like this:

version: '3'
networks:
elk:
volumes:
elasticsearch:
driver: local
services:
  elasticsearch:
environment:
http.host: 0.0.0.0
transport.host: 127.0.0.1
image: docker.elastic.co/elasticsearch/elasticsearch:6.2.1
networks:
elk: null
ports:
- 9200:9200
restart: unless-stopped
volumes:
- elasticsearch:/usr/share/elasticsearch/data:rw
  logstash:
image: docker.elastic.co/logstash/logstash-oss:6.2.1
depends_on:
- elasticsearch
networks:
elk: null
ports:
- 5044:5044
restart: unless-stopped
volumes:
- ./etc/logstash/pipeline:/usr/share/logstash/pipeline:ro
  kibana:
depends_on:
- elasticsearch
environment:
ELASTICSEARCH_PASSWORD: changeme
ELASTICSEARCH_URL: http://elasticsearch:9200
ELASTICSEARCH_USERNAME: elastic
image: docker.elastic.co/kibana/kibana-oss:6.2.1
networks:
elk: null
ports:
- 5601:5601
restart: unless-stopped

Note that every service has an open port. ElasticSearch port 9200 — is an HTTP API to fetch and manipulate logs data. Kibana’s 5601 is the port where user interface is shown — just open http://elk-host.example.com:5601/. Logstash is waiting for input data on port 5044. Both Kibana and Logstash are using ElasticSearch API to read and write data.

The last important thing here is a Logstash configuration file. This file describes where logs come from, how they are processed and where do the go. An example file can look like this:

input {
beats {
port => "5044"
}
}
filter {
if [docker][container][name] == "ggr" {
grok {
match => {
"message" => "%{YEAR:year}\/%{MONTHNUM:month}\/%{MONTHDAY:day} %{TIME:time} \[(-|%{NONNEGINT:request_id})\] \[(-|%{NUMBER:duration}s)\] \[%{NOTSPACE:status}\] \[(-|%{NOTSPACE:user})\] \[(-|%{IPORHOST:user_host})\] \[(-|%{NOTSPACE:browser})\] \[(-|%{NOTSPACE:browser_host})\] \[(-|%{NOTSPACE:session_id})\] \[(-|%{POSINT:counter})\] \[(-|%{DATA:msg})\]"
}
}
mutate {
remove_field => [ "message" ]
}
} else if [docker][container][name] == "selenoid" {
grok {
match => {
"message" => "%{YEAR:year}\/%{MONTHNUM:month}\/%{MONTHDAY:day} %{TIME:time} \[(-|%{NONNEGINT:request_id})\] \[%{NOTSPACE:status}\] \[%{DATA:data}\]( \[%{DATA:optional_data}\])?( \[%{NUMBER:duration}s\])?"
}
}
mutate {
remove_field => [ "message" ]
}
}
mutate {
remove_field => [ "beat", "source", "prospector", "tags", "stream" ]
convert => {
"request_id" => "integer"
"duration" => "float"
"counter" => "integer"
}
}
}
output {
if [docker][container][name] == "ggr" {
elasticsearch {
hosts => "elasticsearch:9200"
index => "ggr-%{+YYYY.MM.dd}"
}
    } else if [docker][container][name] == "selenoid" {
elasticsearch {
hosts => "elasticsearch:9200"
index => "selenoid-%{+YYYY.MM.dd}"
}
} else {
elasticsearch {
hosts => "elasticsearch:9200"
index => "browsers-%{+YYYY.MM.dd}"
}
}
}

In the input section we assert that we expect logs to be sent by Beats to the port 5044. In the filter section we parse input logs for Ggr and Selenoid by splitting them into columns and removing useless fields. Note that conditional logic is based on Docker container name - if you are using alternative names you will need to modify values in conditions accordingly. In the output section we send processed logs to ElasticSearch using separate index name for Ggr, Selenoid and browser container logs.

To start the entire stack only two steps are required:

  1. Copy configuration above to /etc/logstash/pipeline/pipeline.yml.
  2. Start the stack using docker-compose.yml file shown before:
$ docker-compose -f /path/to/docker-compose.yml up -d

Now let’s start sending some data e.g. from the Selenoid machine with Filebeat. To do this — we first of all create a small YAML configuration file /etc/filebeat/filebeat.yml as follows:

filebeat.autodiscover:
providers:
- type: docker
templates:
- condition:
not:
contains:
docker.container.image: filebeat
config:
- type: docker
containers.ids:
- "${data.docker.container.id}"
logging.metrics.enabled: false
output.logstash:
hosts: ["elk.example.com:5044"]

This file configures Filebeat to watch for logs of any container with image name not containing the word filebeat (we will also start it as Docker container) and send them to elk.example.com:5044. Having this file created use the following docker-compose.yml to start Filebeat:

version: '3'
services:
filebeat:
image: docker.elastic.co/beats/filebeat:6.2.1
user: root
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /etc/filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro

We can now open to Kibana at http://elk.example.com:5601/ and select which ElasticSearch indexes to use:

According to pipeline.yml shown above - we should select ggr-*, selenoid-* and browsers-* indexes respectively (a new index is created every day). Now we can run browser sessions and see logs appearing in Kibana "Discover" panel.

That’s it! All source files for this example can be found in repository.

Dealing with Selenium-specific Metrics

In addition to system metrics such as load average or memory consumption it is very important to analyze application-specific metrics to better understand what happens inside the application. For Selenium possible specific metrics could be: browsers usage — overall and per version, total number of sessions being run in parallel, total number of tests waiting for browsers and so on and so forth. With standard Java-based Selenium server a trivial task of getting these metrics is not possible out of the box. How could you do this?

In the majority of Selenium versions the only way to get browser consumption metrics is to implement your own Selenium extension as described in documentation. You should use Selenium API and standard Java Servlets to implement your custom HTTP handler. A simple statistics servlet class can look like this:

package com.aerokube.selenium;
import com.google.common.io.ByteStreams;
import org.openqa.grid.web.servlet.RegistryBasedServlet;
import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import java.util.stream.Collectors;
public class HubStatServlet extends RegistryBasedServlet {
    public HubStatServlet() {
super(null);
}
    @Override
public void init(ServletConfig config) throws ServletException {
super.init(config);
}
    @Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException {
response.setContentType("application/text");
response.setCharacterEncoding("UTF-8");
response.setStatus(200);
        Map<String, Browser> table = new HashMap<String, Browser>() {
@Override
public String toString() {
return entrySet().stream()
.map(e -> String.format("%s %s", e.getKey(), e.getValue()))
.collect(Collectors.joining("\n"));
}
};
        getRegistry().getAllProxies().forEach(
p -> p.getTestSlots().forEach(
slot -> {
Map<String, Object> caps = slot.getCapabilities();
String browserName = String.format(
"%s-%s",
caps.get("browserName"),
caps.get("version")
);

if (!table.containsKey(browserName)) {
table.put(browserName, new Browser());
}

Browser browser = table.get(browserName);
if (slot.getSession() == null) {
browser.increaseFree();
} else {
browser.increaseUsed();
}
}
)
);

byte[] out = table.toString().getBytes("UTF-8");
response.setContentLength(out.length);
        try (InputStream in = new ByteArrayInputStream(out)) {
ByteStreams.copy(in, response.getOutputStream());
} finally {
response.getOutputStream().close();
}
}
    private class Browser {

private int used;
private int free;
        void increaseUsed() {
this.used++;
}
        void increaseFree() {
this.free++;
}
        @Override
public String toString() {
return String.format("%s %s", used, used + free);
}
}
}

After implementing such servlet you have to compile this code, place resulting JAR to Selenium Hub classpath and run it with argument: -servlets com.aerokube.selenium.HubStatServlet. Now you can access statistics data using the following request:

$ curl -s http://selenium-hub.example.com:4444/grid/admin/HubStatServlet

Too much complicated, right? And knowing that from time to time you have to modify this code because of Selenium API changes, would you like to do this?!

Fortunately, from the first releases Selenoid comes with built-in /status API returning detailed usage statistics in JSON format. You get this data with a simple request:

$ curl http://selenoid-host.example.com:4444/status
{
"total": 10,
"used": 1,
"queued": 0,
"pending": 0,
"browsers": {
"chrome": {
"62.0": {},
"63.0": {}
},
"firefox": {
"57.0": {
"my-user": {
"count": 2,
"sessions": [
{
"id": "37809fc9-37b5-4537-a23e-34df28637228",
"container": {
"id": "2a82d79b690a0148fdf59c3af97d3a73df63108090318746df2fa48642410a6e",
"ip": "172.17.2.221"
},
"vnc": false,
"screen": "1920x1080x24",
"caps": {
"browserName": "firefox",
"version": "57.0",
"screenResolution": "1920x1080x24",
"enableVNC": false,
"enableVideo": false,
"videoName": "",
"videoScreenSize": "1920x1080",
"videoFrameRate": 0,
"name": "",
"timeZone": "",
"containerHostname": "",
"applicationContainers": "",
"hostsEntries": ""
}
}
]
}
},
"58.0": {}
},
"opera": {
"50.0": {},
"51.0": {}
}
}
}

In addition to overall server statistics this API shows information about concrete browser sessions classified by browser version and HTTP user. This information can be easily transformed to charts. Let’s do this!

To see the charts we require 3 main components:

  • A ready to use tool that will periodically poll metrics from /status API and send it to time-series database. For example this can be Telegraf - a lightweight daemon with a lot of input and output plugins doing just this.
  • A time-series database instance to store metrics. In this example we’ll use InfluxDB but there are a lot of alternatives.
  • Cool web UI to show the charts == Grafana. Point.
  • All sources of this example are located in selenoid-grafana-example repository. To deploy Grafana + InfluxDB to a remote Grafana host grafana.example.com you can use the following docker-compose.yml:
version: '3'
services:
influxdb:
image: influxdb:alpine
container_name: influxdb
ports:
- "8086:8086"
volumes:
- ./data/influxdb:/var/lib/influxdb
environment:
INFLUXDB_REPORTING_DISABLED: "true"
INFLUXDB_DB: telegraf
INFLUXDB_USER: telegraf
INFLUXDB_USER_PASSWORD: supersecret
grafana:
build: ./grafana
container_name: grafana
volumes:
- ./data/grafana:/var/lib/grafana
ports:
- "3000:3000"
links:
- influxdb
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"
INFLUXDB_URI: "http://influxdb:8086"
INFLUXDB_DB: telegraf
INFLUXDB_USER: telegraf
INFLUXDB_USER_PASSWORD: supersecret

Here we use official InfluxDB container. Grafana container is based on official one and supporting environment variables to pass InfluxDB settings.

  • Now you can go to Selenoid host and deploy Telegraf:
version: '3'
services:
telegraf:
image: telegraf:latest
container_name: telegraf
network_mode: "host"
volumes:
- ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
environment:
INFLUXDB_URI: "http://grafana.example.com:8086"

For this to work you also need Telegraf configuration file to be present in the same directory.

  • Check that everything is running and open Grafana at http://grafana.example.com:3000. The last step is to create a dashboard. Although you can do this yourself - we have an example dashboard showing the most important metrics. To import it go to Grafana Import Dashboard screen and type dashboard numeric ID 3632. You should now have a dashboard called "Selenoid Stats".
  1. Just run some tests against Selenoid and you will have everything sent to this dashboard.

Now you can see browser consumption information as well as important system metrics such as load average and memory consumption that can influence tests execution speed.

Conclusion

I hope you now have a lot more arguments for systems administrator to install and maintain a lightweight Selenium cluster for your team. Browser automation have never been so simple and efficient. While you are reading these lines we certainly continue doing our best to make you even more happy. See you soon!