eBayTech
Published in

eBayTech

The Staging Dichotomy: Part One

A two-part series on how eBay turned around an impeding staging environment into its biggest asset for developer productivity.

The Staging Dichotomy: Part One Cover Image

Do We Really Need Staging?

The benefits of having a staging environment versus the cost of maintaining one may appear lopsided. And there is some veracity to it. Maintaining a fully functional staging environment is hard, really hard. Even if you make it ideal, without a proper system in place, it quickly starts regressing. Moreover, with software releases transitioning from a waterfall to continuous delivery, the changes become small and incremental, opening the possibility of direct canary testing in production. There is even an excellent InfoQ talk titled “ Production — Designing for Testability “ on this subject.

Options

One option would be to create a separate zone in production that is not exposed to the public and is open only to internal eBay traffic. Developers can deploy the outgoing software in this zone and run their entire suite of integration test cases before deploying to production. We indeed have a zone in eBay like this, and it is called pre-production. The issue here, though, is that the data source behind pre-production is the same as production. This means all your test data creation and mutation happen alongside production data. When we tried this in the past, it ended up being an analytics nightmare, where the continuous runs skewed production metrics. Creating a “test” versus “customer” metrics dimension helped a little. However, the data corruption ran deep into production databases and became a real issue. Even with data teardown being part of the test suites, the massive scale of integration tests run continuously across the entire marketplace can flip the production data store into an egregious state.

Context Matters

All of the above approaches have a fundamental limitation, and this is where context matters. eBay is an ecommerce platform. Transactions are essential to whatever we do. And when there are transactions, there are payments involved. We are talking about actual items, transacting between genuine sellers and buyers with real money. The margin of error has to be minuscule. It is just not possible to execute all your test cases in production. Even if we start with a tiny amount of traffic, we need to ensure that all the dependent services work harmoniously to keep the transactions accurate. These services are also rapidly changing, and the assumption that they will just work when put together in production is not worth the risk. Especially when payment is involved, even in the smallest quantities.

Software Release Pipeline
Software Release Pipeline

The Problem with the Problem Statement

Developers were all entrenched with the notion of a broken staging environment, but to say “staging is broken” is unactionable and lacked specificity. Yes, we all knew that staging is broken, but what does that really mean and how can we work toward a solution?

Generic statements are easy to make but difficult to act.

As appealing it may seem to solve, the vagueness can make you spin in circles without the desired outcome. So as a first step, the core staging team set out to break the generic statement, “staging is broken,” into specific problems that would enable us to design holistic solutions. Specificity was the key here.

Actual (or Actionable) Problems

We embedded ourselves into the software development workflow of a few critical applications to understand the actual bottlenecks of staging. After a thorough firsthand experience, we derived the following conclusions.

Screenshots of search and item pages with low-quality staging data for the query “boys shoes.”
Screenshots of search and item pages with low-quality staging data for the query “boys shoes.”
  • A vicious cycle — the chicken and egg problem. The application teams faced challenges in keeping their functionality up and running in staging, citing the lack of data and infrastructure. And since the applications were stale and not deployed regularly, the infra teams were not incentivized to monitor and scale up the systems. We were trapped in a vicious cycle.

Data

A common and well-established idea proposed to address data issues is to create quality data in large quantities before executing the test cases and tear them down once done. Most organizations have well-defined APIs to create data; why not leverage them? In reality, though, this is easier said than done.

Take a subset of production data and move it to staging in a privacy-preserving way.

eBay has 1.5 billion listings in production. Just a tiny (0.1%) subset of the listings, along with its dependency graph, should be sufficient to execute all the test cases confidently. We have to make sure that the subset is well-distributed to cover the breadth of eBay inventory. The production criteria naturally yield themselves to high-quality data. But the most important thing to us was privacy.

​Production to staging data pipeline
​Production to staging data pipeline

Subsetting

At eBay, everything starts with a listing. The goal of subsetting is two fold — identify the listing IDs that are required to execute all our test cases and plot a course to fetch all the required and auxiliary information associated with those listings. To begin with, we took one domain (item page) and extracted all the regression test cases necessary to certify a release to production confidently. It included even the rare and complex data scenarios. From those test cases, we formulated a set of SQL queries that ran against our Hadoop clusters. The queries included listings from all sites and across all categories based on hundreds of item and user flags. The final output is a list of unique listing IDs that specifically target the domain test cases.

Anonymization

Once a set of production tables is identified from which data will be copied, the workflow alerts our information security and privacy teams, and the pipeline is halted. It is a deliberate step to ensure that none of the data leaves the production zone without the review and approval of our security and compliance systems. It only happens when a new table is recognized or an existing table is modified. So our daily runs (explanation comes below), configured only with previously approved tables, are mostly uninterrupted. There are a set of PII-related columns within a table that are by default flagged to be anonymized.

Merging and Post Processing

The anonymized data moves from the production zone to the staging zone adhering to all our firewall protocols. Now comes the merger, whose primary responsibility is to insert the subsetted anonymized production data into the corresponding staging tables. In actual implementation, there is much more nuance to it. For instance, remapping previously migrated sellers to their new items is a complex and costly endeavor. A good side effect of the merger is that it helps identify schema differences between staging and production tables, which did exist due to prolonged staging misuse.

Discovery and Feedback Loop

Now that high-quality data was made available in staging, a way to exclusively query them for all automation needs became paramount. We have existing APIs to fetch items, users, orders, transactions, etc. However, all of them were built with a customer and business intent in mind and not how developers or quality engineers would use them in their automation scripts. Just like the difficulties of using existing APIs for data creation, there is no straightforward way, for instance, to fetch a bunch of items that have more than 10 SKUs and 40 images. It becomes an arduous process. To solve this, we created a Discovery API and UI tool (codenamed Serendipity), which makes it seamless to integrate with all automation scripts. The API only queries the migrated data that are watermarked with a special flag during migration. The filters in the API are targeted toward how engineers write test cases without worrying about entity relationships or microservice decoupling.

Expansion

What started as a proof of concept with one domain, 11 tables, and a few thousand items has expanded to the whole marketplace. Today, we have over a million high-quality listings in staging, along with its associated upstream/downstream dependencies. They serve the automation needs of a majority of our applications. Every day, 25,000items/orders are migrated from production to staging, and the data is spread across 200+ tables, 7,000 categories, and 20 different DB hosts. Beginning this year, we expanded the pipeline to NoSQL databases. This includes MongoDB, Cassandra, Couchbase and eBay’s open-sourced NoSQL offering Nudata. The pipeline architecture is the same for NoSQL, with the curated listing IDs used as keys for subsetting.

Conclusion

That’s a wrap for part one. In this post, we started with a dichotomy if staging should exist or not. We explained why we decided to pursue staging, outlined the problem statements and discussed how we addressed the first problem, which is data. In the next and final post, we will go over how we brought infrastructure stability, turned the vicious cycle into a virtuous one and finally talk about a system we put in place that will prevent us from regressing.

--

--

All about eBay's technology from its engineers, researchers and product owners.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store