Testing SDKs at Optimizely (Part 1)

Published in

Engineers @ Optimizely

6 min readDec 2, 2020

I’ve been at Optimizely for a little over 6 years. Through this time I’ve had the good fortune to work on various application aspects. I built out multiple of our technology integrations during my first year and then got the opportunity to be the first engineer working on the Optimizely Full Stack product. With Full Stack, I got the opportunity to build a product from the ground up and being in the driver seat as we went through the transformation to expand our products and features to be appealing to world class product development teams. There have been a ton of learnings throughout this journey and I hope to share something here which may unlock a new use case for how to be using Optimizely in your product development.

The engineering teams at Optimizely work on various application aspects. Engineers at Optimizely get to work on the web application, our data ingestion and analytics platform, our snippet build and delivery system or on one or more of the Full Stack SDKs. Other than these product centric teams we have a number of engineers working on our billing platform as well as engineers who play the role of Site Reliability Engineering (SRE).

While each team has their own charter and responsibilities, there is definitely a set of cultural qualities that each team exhibits. One of these qualities is to find ways to use the Optimizely Full Stack product itself. Eating our own dogfood allows us to develop better products and understand how our customers use our products. It also allows us to support our customers with use cases like this one where we used Optimizely to build and ship features in our own SDKs.

Our hope here is to share a use case that your team may not be thinking about and how you can leverage Optimizely’s feature management capabilities to unlock new ways of testing that you may not be previously thinking about.

As a member of the Developer Experience team at Optimizely, my team is responsible for building and delivering our Full Stack SDKs. As we build out new features in the SDKs we need to ensure that:

Our SDKs meet the highest quality standards and are rigorously tested. We take this very seriously as our SDKs are in the critical path of our customer’s code.
Our SDKs behave the same functionally across all coding languages. This is very important as a good majority of our customers use SDKs across their entire stack and parts of the stack may be written in different programming languages. So it is important that the offering and behavior is consistent throughout.

During the early days of Optimizely Full Stack, releasing our SDKs used to be extremely tedious. While we had rigorous unit tests in each of our SDKs, we had to test that each SDK returned the correct response and also recorded metrics correctly. Metrics are very central to the Optimizely Full Stack product since they help us understand how an experiment is performing. Because of how counting works, it was not straightforward to set up tests for the scenarios where we were looking at the results of the experiments.

To this end, we came up with a spreadsheet of tests and prior to each SDK’s release we would go through all the tests manually. This manual testing had some issues:

It was time consuming. Testing an SDK manually was easily 2–3 hours of work and with each release the feature set was increasing thereby increasing time to test.
Issues were caught later. We couldn’t be doing manual testing all the time and so usually closer to the release of the SDK we would find show stopping issues.

We needed a system that would allow us to overcome these shortcomings and improve time to delivery for our SDKs. To accomplish this we came up with the following setup.

The Setup

Test applications

For each SDK we have a lightweight web application that wraps top level SDK methods. All of these applications conform to a uniform swagger specification which represents each of the APIs provided by the SDK. This way we can write a universal set of tests and run them against each of the applications thereby confirming that the SDK is functioning properly.

As an example, the applications have an /activate endpoint that wraps the activate method in SDK. When we introduce a new API in the SDK, we add a corresponding end point in the application as well.

The applications are all dockerized leaving the test runner to only worry about hitting the application endpoints and retrieving and comparing the response.

Test definitions

The tests are defined using Gherkin syntax. Doing this allows us to effectively use plain english to describe the steps to be taken when going through the test. This allows people with varying job roles (developer, QA, designer, product manager) to check out the tests and easily interpret what it is testing because they can look at the acceptance criteria and find a corresponding test easily thereby confirming if all the requisite testing is happening or not.

Taking our example of writing a test for activate endpoint further, a test looks something like:

When activate is called with arguments
"""
experiment_key: my_experiment
user_id: my_user
"""
Then the result should be “variation_a”

Test runner

For the test runner we use Cucumber. The individual steps are defined in JavaScript and based on the API, we make a request to the API endpoint on the application running a particular SDK.

For our activate example, the step looks something like:

When(‘activate is called with arguments’, {
  timeout: API_CALL_TIMEOUT
}, async function(api, args) {
  const params = this.load(args);
  const requestBody = {
    params: params,
  };  try {
    this.response = await this.makeRequest(`/activate`, {
      requestBody,
      expectedStatus: 200
    });
  } catch (error) {
    console.log(`Error while activating.`, error.message);
  }
});

We define multiple other steps in JavaScript and when these steps are encountered when running the test, they are translated into appropriate actions. Here, when we see the line “activate is called with arguments”, we make a call to /activate with the arguments and confirm that we get a 200 response.

The received response is then compared against the expected response when the runner encounters the “Then the result should be” statement for which there is another definition in JavaScript.

Like this, we have defined steps for each API and that allows us to quickly hit SDK endpoints and compare the response against the expected response and report success/failure. We will talk about how this set up helped us solve our issues in part 2 of the blog.

The Process

When building out a feature, we first invest time in describing the acceptance criteria for the feature. Once that is settled, before writing any SDK code, we write tests (using Gherkin syntax like described earlier) to validate as much of the acceptance criteria as we can. Depending on the feature, we may also have to update the test applications themselves (say if we are introducing a new API).

Next, we make the changes in one SDK to make sure that it conforms to and passes the tests that we defined. Once we are happy and confident about the feature in one SDK, we simply go forward and implement the functionality in other SDKs.

Now, it is possible that certain features may not be supported in some SDKs, which definitely happens as features are added and released at different times in our SDKs. We always strive to keep the releases close to each other, but if the SDK is good to go we go ahead and release the functionality. The test runner may try to run a new test against an SDK which doesn’t support the functionality yet.

How we manage this test to functionality disconnect we will cover in part two!