Uber fulfillment platform migration at Scale: Part -1

Jalaj Agrawal
4 min readSep 3, 2022

--

Uber is evolving to become the essential for day-to-day consumer needs, At a very high level, when a consumer clicks on the Get A Ride or Get Food button, Uber capture the user’s intent and fulfill it by matching it with the right set of providers nearby. Fulfillment models the user’s intent as demand, and any active provider session that can fulfill this demand as supply in its systems. Uber developed platforms to orchestrate and manage the lifecycle of ongoing orders, jobs, and providers across millions of active user sessions around the world. In essence, fulfillment is a foundational platform capability that powers existing lines of businesses and enables rapid scaling of new verticals seamlessly.
The platform handles millions of concurrent user sessions and billions of trips per year, across thousands of cities globally. Uber handle billions of database transactions every day. There are hundreds of microservices at Uber that rely on the platform as a source accurate state of any ongoing order, job, or a driver and a delivery person session. Now, the events that are generated by this platform, they’re used by hundreds of offline data sets to make critical business decisions.

Uber Fulfillment Architecture

When you open the Uber rider app and request a ride, the request goes through a series of checks and validation, and it eventually triggers fulfillment to create a new order for the consumer. This order represents the intent of the consumer, and Uber translate the intent to individual jobs
that need to be processed. And Uber store all of this information in Spanner. Now, these jobs are periodically read by our matching engine, and offered to any open provider or supplier nearby.

Now, Uber used to store all of this data in our on-prem database, and Uber recently moved to Spanner.

Challenge on-prem database in this original architecture

Uber leveraged a NoSQL storage engine, Cassandra, to maintain the real time state of all fulfillment entities on prem. In order to maintain some notion of serialized ability, on top of Cassandra Uber used Ringpop to provide individual entity-level serialization. There were a few major challenges, but let me highlight a couple of them. Cassandra is built on the paradigm
of eventual consistency. In addition to that, in a multi-data center setup, it’s really hard to guarantee low-latency quorum writes with Cassandra. As Uber’s products and services evolved, Uber started seeing complex storage interactions that required multi-row and multi-table writes. So Uber had to build an application layer framework to orchestrate this operation by leveraging the saga pattern. With the underlying storage platform at Cassandra being eventually consistent, it often led to inconsistencies that had to be manually corrected.

What it translated in the real world would be, a rider seeing two different drivers dispatched to them. Imagine you being out there near your office, and there are two drivers who arrive to pick you up. It’s a bit confusing for anyone out there. As Uber scale globally to millions of active users, even a notion of consistency became harder to achieve at our application level. In addition to all of this, as Uber scale, It started seeing horizontal scaling bottlenecks, as well in the overall architecture, and really, because of the way Uber started with our application layer with Ringpop. This is when Uber started tracing our steps back to the properties of a traditional SQL-based storage engine that provides asset guarantees. Uber started defining a transactional storage engine that would provide consistency as the primary capability. One of the things that has been in mainstream technology was New SQL storage engines that combine the scalability of a NoSQL architecture with strong ACID guarantees. This is when Uber came across Google Cloud Spanner.

Thaught process to move from NoSQL to New SQL

When Uber were thinking about the next chapter for Uber’s fulfillment platform, the focus on consistency became one of the primary evaluation criteria, along with high resiliency and availability. Based on their requirements and various benchmarks, they evaluated various New SQL storage engines that were out there — CockroachDB, FoundationDB, and Cloud Spanner. Cloud Spanner provided all the functional requirements, it scaled horizontally with their benchmarks, and provided us with a managed solution for cluster management and maintenance.

We will continue in Second part of this article with following link.

https://medium.com/@jalajagr/uber-fulfillment-platform-migration-at-scale-part-2-cdcb42743b8a

--

--