Crows Nest, The Sauce Connect Manager
Over the cloud
We love to have our test automation run remotely (i.e in the cloud). In some cases, the local environment isn’t sufficient for the huge test automation environment matrix. Just imagine that your analytics team requires the test to cover all of these
- latest two Chrome releases
- latest two Firefox releases
- latest Safari
- latest Internet Explorer
- latest Microsoft Edge
- Windows 10
- Windows 8
- macOS Sierra
Taking the maintenance cost into consideration, do you really want to do this all by yourself?
Automated test environment cloud providers, such as Saucelabs, give you access to all these environments. By pointing our tests to their plug-n-play environments, we can throw those concerns away, enjoy some coffee and wait for the results.
Penetration
Protected by the firewall, an internal network isn’t supposed to be seen from outside the company. But does that mean the test that runs over the cloud cannot see the server that runs in the internal network? No, not at all. Automated test environment cloud providers have already thought of it. For example, Saucelabs provides a proxy named Sauce Connect to allow tests to penetrate the firewall.
Having a way to grant remote test access to our internal network is gold. It means you can treat the cloud as your local. All of our Github pull request verification tasks in Walmartlabs are based on the Sauce Connect. When a developer PR his change into the git repo, Jenkins will
- Fetch the PR, build local env and do some preliminary check
- Open Sauce Connect on the Jenkins node
- Trigger tests over Saucelabs cloud
- Close Sauce Connect
All of our PR verifications are mocked and self contained, meaning the server/system under test will be running on the Jenkins node in the internal network, and no external request would be made from the Jenkins node during the test. Without the proxy technology such as Sauce Connect, we would not be able to run them over the cloud.
Secured Penetration
The proxy is quite useful, but on the other side, is it safe to give a third party full access to our internal network? You already know the answer. We have to limit external access to something controllable.
The DMZ
Working together with our InfoSec and Network engineering, we created a DMZ (demilitarized zone) network. All requests from Saucelabs to our internal network have to go through this DMZ network. More strictly we only allow DMZ network to access internal Jenkins nodes via a given range of ports.
The whole setup means we have to setup Sauce Connect in the DMZ network.
The Sauce Connect Pool
Instead of launching new Sauce Connect per Jenkins test run, we create a Sauce Connect pool where numbers of Sauce Connects are ready to serve when a test starts.
This pool is both good and bad news. The good news is that Sauce Connect supports pool mode natively (which is called High Availability Mode). The bad news is that, there are some known issues in its high availability mode, such as
- Performance of Sauce Connect goes down the longer it runs.
- There is no way to revive a dead Sauce Connect by itself. When it’s dead, it’s dead.
Also tedious manual operation is required for its maintenance (and we aren’t big fans of anything manual). We want to keep monitoring this pool automatically in case one or more Sauce Connect from the pool is dead, and it should also restart all Sauce Connects in the pool periodically due to the performance degrade.
The Crows-Nest
To address all these issues we created Crows-Nest. Crows-Nest monitors all the Sauce Connect pools (yeah, we actually run two pools for performance reasons). Crows-Nest does following things
- Checks the availability of each Sauce Connect
- Launches new Sauce Connects to replace the dead ones
- Periodically reports to statsd with Sauce Connect health and status
The details of design and implementation could be found in its README. And for your convenience Crows-Nest is even dockerized.
Launching a Crows-Nest is easy. It can either run as a standalone process or as a daemon with pm2.
Since Crows-Nest sends it status to statsd we can visualize our stability and get a better idea of how our Sauce Connects are running and visualizing the report helps us to get a better idea how our Sauce Connects serve the tests.
When there is a glitch in either your network or Saucelabs, Sauce Connect might fail in restarting. Crows-Nest allows maximum 10 retries if needed.
So our typical Crows-Nest scenario is
- One Crows-Nest manages and multiple Sauce Connect processes
- Crows-Nest will restart All Sauce Connect one by one at 3am PST
- Crows-Nest will report Sauce Connect status (start time, stop time, retry number, current status) to statsd and Grafana
- Crows-Nest will keep monitoring Sauce Connect status and restart it when it is dead
We run Crows-Nest as a supervisor process in our DMZ network inside WalmartLabs. With Crows-nest we have successfully migrated all tests on Jenkins into DMZ network under secured monitor.
Surprise with Crows-Nest: Test Performance Improvement
In our comparison of test performance between using a single Sauce Connect and a Crows-Nest monitored Sauce Connect Pool we found a very interesting side effect by using Sauce Connect pool: tests run 10% to 15% faster approximately.
What’s Next
Crows-Nest can also work as Sauce Connect manager. Our next goal for the Crows-Nest is to integrate it into Magellan as Sauce Connect manager, so our users can debug their tests in a similar setup as our Jenkins has.
Embrace The Cloud
The ability to run our internal automation tests over the cloud makes our life much easier. Better utilities/tools can help make test runs easier and more consistent. If you’re seeing similar issues with your Sauce Labs setups, I recommend you try out Crows-Nest.