Scaling secure tunnels for browser testing

Testing is critical to Walmart’s successful e-commerce business. There are 100s of development teams across the world writing code and all of this code needs to be tested before it sees production. But trying to support every browser and device version to test on is a herculean task in itself.

We achieve this via a vendor, Sauce Labs, which does just that. You give them a selenium script with a browser & version and they’ll execute the tests on that specified browser. Most of the time, these tests make service calls back into our internal network which causes a new problem. We can’t open up all of our test APIs to the public.

Sauce Labs has Sauce Connect™ tunnels which allows us to create a secure connection into our own network!

Sauce Connect™ is a proxy server that opens a secure connection between a Sauce Labs virtual machine running your browser tests, and an application or website you want to test that’s on your local machine or behind a corporate firewall

We have setup multiple DMZ boxes to handle this inbound traffic from the Sauce Connect™ tunnels.

While this sounds like the perfect setup, there is still some plumbing that needs to be done to ensure scalability and availability. We also ran into the following 2 issues:

  1. Sauce Connect™ tunnel would become unresponsive with no notification
  2. Suace Labs recommends to restart each Sauce Connect™ tunnel once every 24 hours

Introducing Lookout & Raven!

  • Raven is an application that launches a Sauce Connect™ tunnel in the DMZ
  • Lookout is an application that monitors the health of the Raven(s)
Lookout — Raven — Orchestrator Architecture

A Raven’s sole purpose is to start and stop a Sauce Connect™ tunnel. The health check logic is pushed off to the lookout application because Sauce Labs rate limits authenticated requests.

For authenticated requests, these limits are imposed for individual user account

The limit is 3. If we have more than 3 Raven’s with the same credentials for a tunnel, we would rate limit ourself checking the status of the tunnels.

Instead, we push this logic onto the Lookout application. A single Lookout application will be deployed for a cluster of Ravens that share the same tunnel id. This allows us to check the status for a collection of tunnel ids with the same credentials. Once Lookout acquires the list of active tunnels, it writes these tunnels to file. At this point, we can have all of our Raven’s check the file to see if they are still active tunnels. If they are NOT in the active list, they will terminate themselves.

All of our Ravens and Lookout applications are controlled by a single configuration file and process management which we call our Orchestrator.

Orchestrator solves the following problems:

  • availability requirements by having the process manager restart the Raven any time it terminates itself
  • requirements by deploying to multiple DMZ machines
  • scalability requirements by allowing us to add/remove/restart tunnels without impacting other running tunnels

Impact of these applications

  • Zero manual restarts for unresponsive tunnels since the Lookout and Raven implementation (Previously needed a manual restart ~4 times a week)
  • No customer test case issues reported because of tunnel outage (Previously received ~1 customer issue a week)
  • Zero impact on existing tunnels or running test cases when starting new tunnels

Conclusion

Ensuring that code is tested on multiple browsers and versions is very important to any company with a web presence. There are so many different versions of browsers that setting up the correct infrastructure to handle that is a great undertaking. Take advantage of the tools that are already available and use them to your benefit.