Selenium: Storing Data Efficiently

Hi there! I hope you should have already forgotten about main Selenium performance and stability issues by replacing Selenium server to Selenoid. If not check my previous articles:

When running Selenium tests you certainly can spend your time staring at browser screen with the test being executed. However when total number of tests grow — two main ways of understanding what happened in concrete Selenium session are its logs and video recording if present. More sessions you have — more such files you get and it becomes more important to store them efficiently. Today I would like to show how to efficiently store such artifacts.

Where to Store Selenium Artifacts

But what are the most popular ways of storing and sharing Selenium artifacts with colleagues? Let’s have only one requirement for candidates — file should be easily accessible with standard browsers on any popular platform: Windows, Linux and Mac.

Being an experienced developer from 2000s or even 1990s you certainly remember FTP servers which were very popular file sharing way some time ago. FTP is by default transferring data and access credentials unencrypted and doing everything very slowly and FTPS is not so widely used, so forget about it.

Then you can also recall SFTP or SCP — both working on top of SSH and being very popular among Mac and Linux users. The main problem with them is the lack of support in Microsoft Windows and that they are mainly used through command-line which is a nightmare for beginners. Also browsers by default do not support getting files via these protocols.

I think most of you have used BitTorrent to download some files. The main problem with it is that we require to install a client application and which is the most annoying — BitTorrent is a peer-to-peer protocol using client machines to store the file. In case when no other clients are online — you will not be able to download desired file.

The most popular protocols supported by any browser are HTTP or HTTPS. They are used to transfer web pages content to the end user. What about installing a popular web-server and using it to serve Selenium artifacts to the end user? Possible but not reliable. To deliver really fault-tolerant web-server you have to install it to multiple datacenters behind a reliable load balancer. Then so far as an upload request will go to only one replica — you have to somehow synchronize uploaded files among replicas. Every web-server instance should have reliable storage, i.e. use RAID volumes that continue to work if some part of disks fail. And even doing this does not resolve all the issues. When you have to store billions of files — you can reach maximum number of files allowed by the file system. For example modern file systems such as NTFS or ext4 allow no more than 2^32 - 1 = 4 294 967 295 (four billion) files. And the last nail in the coffin - uploading large files to traditional HTTP servers is slow, sometimes even slower than FTP.

Too many issues with HTTP, right? Is there a ready-to-use protocol with HTTP file access, built-in reliability, data storage redundancy and possibly available in the popular cloud platforms? Certainly! It is called S3. Introduced in Amazon Web Services cloud in 2006 — it is now became a de-facto standard protocol in public and private cloud file storage. What distinguishes S3 among other protocols is an ability to store an unlimited number of files and be sure that multiple copies of your data are automatically saved to independent storage volumes. You have flexible access control capability and all popular cloud platforms such as AWS, Google Cloud or DigitalOcean provide S3-compatible storages. Finally S3 is so popular and so cheap that a lot of ready to use open source client libraries exist for different programming languages.

And how about corporate networks? Very often in big companies you may have a requirement to store all the data inside the network. In that case you can start open-source S3-compatible servers such as minio. This server is fully compatible with S3 protocol and allows to use different backends to store the data from regular file system to any custom corporate storage system.

So having considered all the facts above — we chose to support S3 file uploading in Selenoid as a built-in feature. Let’s see how to use it.

Using S3 with Selenoid

Selenoid has built-in support for uploading saved videos and log files to any S3 compatible storage. This feature is not needed to all users so by default we provide Selenoid binaries and Docker images without it. To use S3 let’s first build Selenoid with this feature enabled. Having Golang 1.11 and above do the following:

  1. Clone Selenoid source code:
$ git clone https://github.com/aerokube/selenoid.git

2. Compile Selenoid with S3 feature enabled:

$ cd selenoid
$ GOOS=windows GOARCH=amd64 CGO_ENABLED=0 go build \
-tags 's3' -ldflags "-s -w"

Note -tags 's3' flag enabling S3 feature and -ldflags "-s -w" flag allowing to reduce final binary size by stripping it.

3. Build Docker image:

$ docker build -t selenoid:s3-latest .

Enabling S3 feature was simple, right? Now let’s start Selenoid configured to automatically upload files to S3. To do this we need to know:

  1. S3 endpoint, i.e. a base HTTP URL providing S3 API
  2. A pair of access credentials: an access key (an equivalent of user name) and a secret key (an equivalent of password).
  3. S3 bucket name, i.e. root directory name to upload Selenoid artifacts
  4. An optional S3 key signature algorithm (S3v2 or S3v4)

All this information is usually provided by your S3 API provider. Having it start Selenoid as follows:

docker run -d  --name selenoid --restart always \
-e OVERRIDE_VIDEO_OUTPUT_DIR=/opt/selenoid/video/ \
-p 4444:4444 -v /etc/selenoid:/etc/selenoid:ro \
-v /opt/selenoid/video:/opt/selenoid/video/ \
-v /opt/selenoid/logs:/opt/selenoid/logs/ \
-v /var/run/docker.sock:/var/run/docker.sock \
selenoid:s3-latest \
-conf /etc/selenoid/browsers.json \
-video-output-dir /opt/selenoid/video \
-log-output-dir /opt/selenoid/logs \
-s3-endpoint 'https://storage.googleapis.com' \
-s3-region 'us-west-1' \
-s3-bucket-name 's3-bucket-name' \
-s3-access-key 's3-access-key-value' \
-s3-secret-key 's3-secret-key-value'

That’s it! Now when you start launching tests against Selenoid — you will see videos and logs in S3 bucket.

S3 Extras

Having basic S3 uploading working let’s do more tricks. By default all uploaded videos are saved as <session-id>.mp4 and logs are saved as <session-id>.log, where <session-id> is a Selenium session identifier. With many running session this will look like a single "directory" with thousands of similar files inside - which is inconvenient to browse. Selenoid can easily distribute uploaded files over a set of directories named with session metadata values such as browser name, version, platform and so on. If you read S3 documentation thoroughly - you should know that S3 is a flat key-value storage without directories support. However by convention / (forward slash) symbol in key is treated by S3 web interfaces as path separator and such UI is showing you "directories" and files inside. For example, the following two keys...

selenoid/firefox_63.0/video.mp4
selenoid/firefox_63.0/logfile.log

… will be shown as if video.mp4 and logfile.log files were stored in selenoid/firefox_63.0 directory. Returning to Selenoid - to define how files are stored in S3 bucket, just provide so-called S3 key pattern. This is done with -s3-key-pattern flag or s3KeyPattern capability if you wish to include some data from the test, e.g. test case name. Such key pattern can include one or several supported placeholders described in documentation. For example, $browserName will be replaced by respective browserName capability value, $sessionId - by Selenium session ID and $date - by current date:

-s3-key-pattern 'my-prefix/$browserName/$date/$sessionId/$fileType$fileExtension'

By default Selenoid uploads both video and log files but sometimes you may want to upload only video files or only log files. This can be achieved by including or excluding files using -s3-include-files or -s3-exclude-files flags. These flags are supporting shell-style wildcards like *.log or *.mp4. For example to upload only video files add one more flag:

-s3-include-files '*.mp4'

The rest of the files will be ignored.

You should now know the right way to automatically store your Selenium artifacts in S3 storage. If you have any questions feel free to contact me in our Telegram support channel: https://t.me/aerokube