Scaling Selenium with Openresty
Selenium has been the de facto automation standard/tool for browser/UI automation for quite sometime.
Creators of selenium also created selenium grid for scaling the UI testing. The grid consists of a selenium hub and several selenium nodes which connects/hosts the browsers and thus providing a way to run several UI tests simultaneously. Selenium also provides some easy way to customize/extend existing capabilities. For most of the use cases the selenium hub /node based architecture scales pretty well and doesn’t need much customization.
Need for higher availability and scale
The architecture, although supports scaling the selenium nodes, it doesn’t make it easy to scale the hub itself. On a typical deployment there would be a single hub and several nodes attached to it.
The mapping of sessions/browser instances is maintained in memory inside the hub. So, if the hub fails, the whole grid will become unusable.
A simple way to enable some level of HA is to run at least 2 hubs.
Challenge with multiple hubs
Requirement of sticky session
Since selenium hubs can’t talk to each other, the distribution of load to different hubs need some kind of stickiness. Stickiness here needs to make sure that once the session is created on one hub, all subsequent request for that specific session comes to the same hub.
Session stickiness is not a new thing. A typical stickiness for browser based clients is done via cookies. The LB looks at the cookie information being passed to decide which upstream server the request would be forwarded.
Cookie based stickiness ruled out
Passing this extra bit of information using cookie is not possible in selenium automation scenarios, so this was not an option for our use case.
Client ip based stickiness
One option that we have been using so far is to create stickiness based on client ip.
So all calls originating from one client ip will always land to a given hub. While this works for most of the scenarios, the load on hubs are not distributed evenly.
This particular technique solves the HA problem as such, but this may not be a great solution for scaling if the load is not even.
Uneven load and uneven test cases in test suites
A large test suite with huge number of UI tests can completely chock the one off hub where all tests are landing.
For making our test infrastructure more reliable and robust, we may have to explore some different avenue.
Openresty to the rescue
Openresty is a software bundle of nginx with some high quality Lua libraries included. It allows you to customize load balancing behavior and add custom behaviors.
Design
We can add openresty as a proxy layer before the hub.
We can deploy couple of instances of openresty in front of hub and the LB can forwards the request to one of these openresty instances in a round robin fashion ( earlier the LB was forwarding the request to hub directly using the client ip hashing).
Openresty instances works as a reverse proxy and all instances of hub are upstream server for each openresty instances.
The Magic
The way the request forwarding works is:
- Each new session request will be forwarded to one of the hub in a round robin fashion.
- Once the session is created at hub and response comes back to the openresty proxy instance, it inspects the response body, looks for session id and appends the upstream host ip(host ip in number format) to the existing session id and responds back to the client.
- The client receives a new session id which has the upstream hub node embedded in it.
- Any subsequent operation on that session from client uses this session id in the url path, which now has the upstream hub’s ip embedded in the session id.
- The proxy inspects the request url and extracts the session id from the request.
- It rewrites the url path by removing the embedded host ip part.
- It also sets the upstream host to the specific host found in the ip section of the session id.
- Then it forwards the request to the desired hub.
- The hub sees the session id in the exact same format as it had generated and continues to process as usual.
Sample nginx conf with embedded lua code:
Taking a step further
While solving the scale and availability with openresty proxy, we can use this proxy for some more features built on top of Selenium hub.
- Rate limiting: Even though this would help in scaling the hub and nodes, there are scenarios where the Spike in number of new sessions may exceed the number of available resources. Currently all requests are put in the queue and processed when the resources are available which results into delayed session creation. Different test cases/frameworks have different timeout settings and while some of them proceed, many of them would get stuck. Many times, while the sessions get times out at client/test case layer, the hub still will go on and create a session. This unused sessions later gets cleaned up but it makes resource crunch even worse. To solve this we can add a rate limiting feature, where based on the capacity and traffic pattern only certain number of sessions in the queue would be allowed and beyond that we can return a http 429 status with some details like current active sessions and an estimated time after which it can be retried.
- Monitoring data: We can add a lot of monitoring data about sessions bring requested, time to create a new session, active session time etc from proxy layer, which can give further insight on how to optimize the infrastructure usage.
References
Image source: https://pixabay.com/en/skyscraper-highrise-building-tall-1209407/
Selenium: https://www.seleniumhq.org/
Selenium grid: https://www.seleniumhq.org/docs/07_selenium_grid.jsp
Openresty: https://openresty.org/en/