Designing & Deploying a Web SDK — Part Two

Published in

AppsFlyer Engineering

8 min readJun 17, 2020

In our previous post we discussed how our web SDK was designed and engineered, to allow other in-house teams to be well integrated with their projects as well. The post introduced the building blocks of the product: code snippet, CDN, cloud storage (bucket), and the SDK itself. Most importantly, it reviewed the different design options, and eventually why we chose one over the other.

This post will cover the deployment process end-to-end, from the basics through more complex questions:

What is the deployment process? What is the workflow from the developer’s branch to the user’s web browsers in production?
What is the testing workflow? At which point does QA get involved?
How do you manage to work in a cross-team environment without any coding or production conflicts?

All these questions will be answered in this section so sit tight.

Designing our CI/CD Process — Take 1

The first CI/CD pipeline we built was as follows :

Create versioned artifact for each plugin (semi-automated)
Since the SDK teams now work on a shared “base SDK” project and build the relevant plugins for each SDK, each time a project is updated, multiple files are created to support all the possible runtime options.
Unit test each plugin (automated)
Each plugin has a different code base and different functionality. To ensure that none of these plugins was corrupted during creation, unit tests run on each plugin automatically.
Cloud storage — store version plugins (automated)
Push the final and tested plugin files to a private bucket for internal use.
Build and deploy SDK permutations to public cloud storage (automated)
Since the SDK teams now work on a shared “base SDK” project and build the relevant plugins for each SDK, each time a project is updated, multiple files are created to support all the possible runtime options.
Purge CDN (manual)
Access the CDN platform and purge the CDN cache. Distributing the new SDK version to the world as fast as possible requires the purging of all the cache layers that the CDN provider has.
Our first CI/CD pipeline drawbacks

Our first CI/CD pipeline drawbacks

Too many variables -
Trying to identify each commit state, while supporting versioning each plugin, resulting in many variables required for decision making when executing a CI/CD pipeline.
Single development environment -
We have X teams, working on Y versions at any given time. We didn’t take it into consideration that we will need X*Y environments to run our testing flows. The outcome of having a single dev environment became a bottleneck and developers were queuing up to use this environment.
Single test environment -
As a result of a single testing environment, new developments could not be tested simultaneously, resulting in yet another long queue of commits to be approved for deployment. Updating the testing suite on the new deployment purposes takes a lot of time as well adding even more time to the deployment bottleneck.
Too many operations platforms -
We work with many different DevOps platforms in-house: Gitlab, Jenkins, Akamai, AWS, Vault, Artifactory, and more. It is extremely difficult for each developer to know how to work with all the different platforms, moreover, to understand its role in the CI/CD pipeline.

Design philosophy

The first design philosophy was basically to create everything from scratch.

Since this project was quite sensitive, as each and every change will eventually be visible on each of our client’s websites, we wanted to have as much control over the flow as possible. As it turns out, this approach had its downsides.

Our pipeline process needs to be able to support our exponential growth from two perspectives:
First, it needs to be able to scale to serve the growth of our clients’ needs, enabling us to roll out new features rapidly.
Second, supporting our growing development teams without compromising quality.

The maintenance overhead for this whole ecosystem that we created was massive, where even the simplest task was extremely difficult to execute.

We learned quickly that the DIY philosophy would be impossible to maintain, and we decided to try a new approach.

We understood that this whole CI/CD flow that we created was not going to work for many reasons, but the most significant one being that this was an entire product of its own! CI/CD is not an easy task to build, and building it in a way that will support all of the relevant use cases will take months to develop.

So we regrouped and set out with a new guiding philosophy of leveraging a CI/CD platform that already works.

Let’s see how we manage to tackle the bottleneck we previously mentioned while using the in house tools we already have.

let’s tackle the bottlenecks

First Bottleneck — Too many variables => Predefined variables

Previously, we tried to explore each commit state by building dedicated scripts, responsible for manipulating the pipeline behavior.

After some investigation, we found that Gitlab’s integrated CI/CD, supported our requirements, making it easy to adopt and integrate into our pipeline.

Gitlab already created all the environment variables we originally injected ourselves. For example the commit message and description, branch, and even specific files’ changes. There was no longer a need to write code for fetch operations, this was now covered by just calling the predefined variables inside the Gitlab runners.

Plus, this tool was already in our tech stack.

First bottleneck — complete.

Second Bottleneck — Single Development Environment => Multiple Development Environment

We understood that we must create a way for developers to work without any conflicts with their teammates. Each developer that opens a merge request should be able to get a working isolated environment to use. We can test our code locally, but if we want to fetch our code to some testing site using the snippet, we need to store the permutations somewhere and create a way to fetch them in a real-time isolated environment.

The solution was out of the box again by using GitLab CI/CD features, Akamai CDN services, AWS S3, and JFrog’s Artifactory storage.

In the previous design, the responsibility for creating the SDK permutations was shared between two platforms, Gitlab and Jenkins, which required us to use middleware storage for one to pass data to the other. Now that we use a single platform for CI/CD pipeline, this is no longer needed.

First Step — Restructuring the plugins tree in Artifactory
We created a source root dev folder for all the active branches, where each branch included its latest plugins files. We maintained this storage for internal usage only, like backward compatibility tests.

Second Step — Connecting Gitlab to Artifactory
With each commit pushed (dev or prod), the pipeline creates artifacts in Artifactory under the relevant branch folder.

Third Step — Re-structuring the SDK buckets in S3

We created a dev-dedicated bucket where each branch stored its own SDK permutations.

Fourth Step — Adding flexibility to our Akamai CDN rules

Up until now, we used Akamai’s Cloudset rules with static URL-params indicating the required SDK-permutation. By adding the regex-rule for the development environments, we could now inject the branch name to the URL for redirecting to the relevant development environment.

For example, let’s say developer 1 is working on a branch called “branch-1”, and developer 2 is working on a branch called “branch-2”. They receive this URL in return (which holds the code for their branch’s SDK exactly as it would behave in the production environment):

https://some-sdk-location.appsflyer.com/dev/<branch-name>/.

The changes in the inputs (the branch name and the permutation hash) in this URL will cause the URL to fetch the relevant SDK permutation.
To receive the required fields developer 1 will call :
https://some-sdk-location.appsflyer.com/dev/branch-1/sdk1
Developer 2 will execute the call with this URL:
https://some-sdk-location.appsflyer.com/dev/branch-2/sdk1

Done. That is it.

With minimal work, we managed to resolve a huge bottleneck. We now have the ability to create isolated SDK environments dynamically, thus enabling the growth of our development teams.

Third Bottleneck — Single test environment => branch-based test environment

After the assembling of parts 1 and 2, testing was simplified significantly.

Now that we had a branch-dedicated environment, we could update the test-suites to receive a dynamic parameter indicating the location of the SDK files relevant to the branch.

By triggering the Jenkins jobs with the branch-name parameter, we created a way for developers to run the SDK testing suites whenever they want, making each commit pipeline run in an isolated environment.

Last bottleneck — too many operations platforms => one centralized pipeline

We understood that it’s crucial to the success of the project that monitoring and controlling the CI/CD flow must be clear and accessible for all developers.

This was made possible by having one centralized pipeline (over Gitlab’s CI/CD tools), holding all the required credentials for communicating with other platforms involved.

This pipeline also saved us a lot of undocumented information that each team would have needed to successfully deploy their SDK. It now takes care of everything, and the developers don’t need to access external platforms manually.

Conclusion

Many times achieving significant changes within the organization start with the mindset, and not in the code. By changing our approach and philosophy, we were able to solve some huge pain points for the company.

From a single-step pipeline that was extremely hard to work with, as well as almost impossible to understand and maintain, we evolved into something more practical that even had the added bonus of introducing many new benefits along the way. We successfully deleted a substantial amount of code, all while at the same time gaining new capabilities.

All the tech was available in-house, so we didn’t even need to introduce new tooling, and we realized we didn’t really need to reinvent the wheel, many of our pains were actually already taken care of by someone else. The growth of the company in such a short period of time requires us to engineer and design all of our products for the long-term, we don’t have the luxury of just hacking something together and thinking “we will create something better in the future” — as the future arrives very rapidly in a hyper-growth company.

From day one we needed to create a product with the design thinking of 100X scale, and not for today’s user base — but this trial and error process allowed us to build a robust, resilient system that will serve us going forward as well.