Uber fulfillment platform migration at Scale: Part -2

Jalaj Agrawal
3 min readSep 3, 2022

--

This is continuation of previous article of migration of Uber.

https://medium.com/@jalajagr/uber-fulfillment-platform-migration-at-scale-part-1-e0fb1227c939

Fulfillment architecture with Spanner work architecture

Uber leveraged Spanner’s North America multi-region configuration as the storage engine for all fulfillment entities. Our fulfillment service transcends Uber’s own operational regions, which are in North America as well, and they make a sequence of network cores across to Spanner, which is deployed in Google Cloud for every transaction that a user makes.
So for every user request, or any system operation, it results in a transaction against one or more rows and across one or more tables in Cloud Spanner. With Spanner Uber are able to provide a consistent view of all our data to internal as well as external customers. Uber leverages this hybrid of on-premise and Cloud Infrastructure. Now, knowing that keeping latency to the minimum is the biggest factor for an app like Uber.

Uber’s global networking infrastructure team and Google Cloud platform’s networking team, Uber have built an extremely resilient and highly reliable network infrastructure to support Uber’s workload. In order to achieve this, Uber separated the networking infrastructure into two major components —

physical layer, that consists of the interconnections between Uber and the Cloud vendors,
logical layer, that consists of virtual connections on top of the physical layer to achieve redundancy. Uber built redundancy across these layers by having multiple routers and local access points at each physical network route, and an additional layer of redundancy with the logical connections on top.

Uber even leverage Google’s private Google access throughout all of Google’s traffic, the Google APIs through the Cloud interconnect VLAN attachments.
This reduces the need of routing the traffic through public internet, and provided additional reliability.
To validate our architecture, the teams came up with a benchmarking suite that validates all the physical and logical network routes extensively, and Uber find any inefficiencies in this test set up very quickly. This was extremely critical for the success of the overall project.

Migration Strategy

This was extremely tricky. One of the things that Uber agreed upon from the beginning was an internal platform re-architecture should not impact the consumers in any way. So Uber focused on building the technology that would seamlessly migrate any ongoing user session from our old architecture to the new Spanner-based architecture. Now, since the data models and database topology of our old platform was significantly different from the new architecture that leveraged Spanner, any kind of light migration of data was ruled out.
Initially, given the fact that most of this data is ephemeral and continuously changing every minute, doing a light migration through backups will only result in loss of data.
So Uber built a system that intercepts a request from a user session them to the world where the user session started. If there are any ongoing orders for the session, Uber do not migrate the user session until the order is completed. For an open user session with no active orders, Uber switch them to the new architecture which is backed by Spanner. Now, the team automated all of this through tooling, and Uber tested the tooling rigorously in our testing, staging, and shadow environments. Uber even set up test cities, where
Uber simulated hundreds of riders and drivers, and their behavior in the real world, to ensure Uber can observe the impact of this migration. And these migrations were done by one city at a time, in some cases a batch of cities, and it took more than six months to complete. Spreading this migration over six months also helped us to continuously migrate business features from the old stack to the new stack. There were 120-plus business features that were to be migrated from the old stack to the new stack. So the overall technical and execution perspective, it was a very difficult and challenging project that Uber undertook as a team.

--

--