Pioneering Hyperlocal Control: The Journey from City to Zone-based Serviceability

Published in

PharmEasy Tech

10 min readAug 8, 2023

In the complex sphere of software architecture, our hurdles often unfold like intricate tales. Imagine a local shop that steps into the digital world. At first, it serves the whole city, but delivery times differ based on customer distance. Shop owner might setup small outlets within city to adhere to better ETA depending upon the revenue. But when the shop decides to go national, a conundrum arises. Running everything from a central shop leads to longer delivery times, while setting up multiple outlets is expensive. So, the shop opts for a few larger stores nationwide, ensuring product availability and improved delivery times.

Visualize, a situation where the shop intends to entrust the sale of certain items to local retailers who wish to limit their service to specific pin codes, diverging from the full coverage provided by the shop’s branch. In scenarios where a customer searches for a product — newly added due to the involvement of a local retailer — in an area unserviced by this retailer, it’s crucial for their listing to remain hidden. On the flip side, when a customer resides in an area that both the branch and local retailer cater to, the system must strategically determine which listing to present, based on a carefully curated set of parameters.

A few years ago, PharmEasy found itself in a similar situation, operating primarily through handful of large retailers mapped internally to the supply city which was the further mapped to demand cities, hence catering to a vast geographical region (i.e Thousands of pincodes). These retailers would have massive inventory encapsulating easily 200k — 300k listings.

Supply city based design with no control on partial serviceability

The above mentioned supply city based construct has a limitation — a retailer who wished to service within a part of city, but not its entire expanse. Our supply city-wide construct didn’t support this “partial serviceability”. As such, potential partnerships with franchisees and other local retailers were stalled due to our system’s inability to handle hyperlocal visibility.

I had the privilege to be a part of the exhilarating journey of almost a year that followed. We transitioned from a supply city-based model to a more precise, zone-based approach. This significant shift vastly impacted our serviceability model and set us on a whirlwind learning adventure filled with in-depth discussions, strategic planning, and meticulous execution. This article aims to illuminate this crucial architectural transformation, weaving together its technical nuances and overarching business goals into a compelling narrative. Let’s embark on this journey, retracing the steps we took, the problems we solved, and the insights we gathered.

Conceptualizing Serviceability, Zone Listing Service and the ETA Engine

Indeed, it’s essential to define our terms properly to ensure a shared understanding. So, let’s take a moment to clarify some of the key concepts involved in our recent transition to a zone-based serviceability model.

Serviceability primarily refers to the ability to deliver a product to a specific location. It is essential to distinguish it from availability. A product can be available, but it may not be serviceable due to various reasons, such as logistics constraints or business policies.

Zone Listing Service is our back-end system specifically designed to manage listing serviceability and compute the best listing among all the serviceable listings for a particular product in a location. To understand this service better, we first need to define Zone and Listing:

A Zone is a cluster of pin codes. We have created these clusters to simplify and optimise our operations, since working directly with individual pin codes could potentially lead to significant data overload during serviceability changes and impact our indexing processes. We’ve designed these zones so that no two zones overlap, ensuring a clear demarcation of territories.
A Listing refers to a unique combination of a retailer and a product. Given that multiple retailers can sell the same product, our system creates unique listings to distinctly identify each combination.

The ETA (Estimated Time of Arrival) Engine is another crucial back-end system designed to compute and provide accurate ETA information for each ‘Listing <> Zone’ to consumer systems. This robust engine uses a range of inputs, including product attributes, logistics configurations (i.e last mile SLA), retailer location attributes, and Just-in-Time (JIT) procurement configurations. It also takes into account multiple policies to provide serviceability information for a listing. Hence, serviceability is a function of logistics policies (based on source pin code), business policies, and retailer policies.

With these definitions, we aim to provide a clear understanding of our system and the shift towards a zone-based model.

A Sea Change: Transitioning from City to Zone

At a high level, the shift from a city to a zone-based approach has allowed us to implement hyperlocal control. Prior to this transition, a seller’s inventory was visible to an entire city. This limitation prevented us from supporting a more detailed geographic scope.

Zone based architectural shift in the construct

The introduction of zones, however, brought about several key enhancements:

Sellers can now serve a handful of zones, enabling a hyperlocal focus.
National and regional shipping can be easily managed.
Finer control on policies, including Logistics, Partner, and Product, is now possible.
Sellers can now sell to more number of zones, irrespective of district, state, or regional boundaries.

While the benefits of this transformation are clear, it’s vital to recognise the significant technical investments required to facilitate the change. Our transition from a city-centric model to a zone-based structure, which accommodates the extended marketplace construct with hyperlocal features, has inevitably escalated complexity and resource demands.

Grappling with Technical Challenges

The transition to zones has considerably expanded the information footprint our system must handle. To give you a sense of the scale, a single update to a listing belonging to any large retailer now triggers a ‘fanout’, i.e. an information propagation to all serviceable zones. This change in architecture has led to our system processing an astonishing a million zone-listing events per min in peak times, solely due to this fanout effect.

Let’s break down this calculation in a more readable and understandable way:

A large retailer listing is typically serviceable in about ~500–700 zones.
A local retailer listing usually services about ~10–20 zones.

Now, let’s check the operations involved for a large retailer listing:

Each large retailer listing change signal results in approximately ~500–700 write operations (one for each zone).
For each listing, there are about similar (i.e proportional to write in worst case) read operations required to determine the top listing in each zone. We try to minimise this database reads with cache(Redis) lookups but for the first time the reads are inevitable hence resulting in again ~500–700 read operation
If a top listing changes, an additional operation is required to write this change to the database, update the cache, and pass this information downstream. On average, this results in about ~100–300 additional operations per listing.

By adding these together, we arrive at a total of about ~1100–1700 (+/-10%) database operations per listing attribute change event.

To put this into perspective at scale, let’s consider our overall system:

We handle around 10 million listings, of which 6 million listings belongs to large retailers (most of them with an average of ~500–700 serviceable zones) and rest contributed by other retailers. As not every retailer would have all the products and not every listing (i.e retailer-product unique combination) will be serviceable to all the zones, In reality we handle about ~1.68 billion zone listings and ~400 million top listings in total.

Our system operates at an impressive scale at non-user path, handling a throughput of 600k-800k Kafka events per minute. At the same time, our database performs at a rate of about ~2 million operations per minute. This level of performance is crucial to keep up with the vast amount of data we need to process and synchronise.

High Level Design — Zone Listing Service and ETA engine interaction in our non user path pipeline

This dramatic increase in load necessitated a comprehensive review and optimisation of our services, both internally and across interfaces, to prevent potential resource exhaustion. We evaluated a variety of database solutions, including ScyllaDB, DynamoDB, Cassandra, and HBase, each offering different trade-offs in terms of cost, maintainability, and observability.

We also embraced the Reactive paradigm, introducing it into our ecosystem. This approach promotes an asynchronous programming model that allows for more efficient use of resources, especially under high load conditions. There were numerous optimisation done during the development phase, to mention one of them was to choose and setup our database for this scale.

In order to optimise performance, we decided to use ScyllaDB(Drop in replacement of Cassandra) in combination with DataStax drivers and CQL prepared statements, instead of using the reactive drivers provided by Spring. This approach allowed us to leverage the high performance and scalability of DataStax’s drivers, which are specifically designed for Cassandra, and the efficiency of prepared statements in CQL. This technical decision played a crucial role in enhancing the responsiveness of our application and enabling it to handle the high data volume efficiently. In forthcoming articles, we shall delve deeper into the nuances of how our systems adeptly manage this expansive scale.

By making thoughtful decisions and technical updates, we’ve managed to handle a rise in data volume without letting costs get out of hand. At the same time, we’ve retired older parts of our code that weren’t as efficient. This has helped us improve our overall system and put us in a strong position to add new features in the future.

Tying it Together: The Business Impact

The integration of Listing Serviceability and the ETA Engine was a crucial move for our business. It:

Expanded our platform’s product range by enabling both regional and national shipping for retailers.
Boosted platform availability and introduced more detailed, accurate Estimated Time of Arrival (ETA) predictions.

This shift to the Listing Serviceability and ETA Engine systems has opened up several exciting possibilities, including:

Wider service areas for Hyperlocal retailers.
The ability to replicate national inventory.
The introduction of innovative delivery modes such as Hyperlocal and Hybrid.

In sum, moving from a city-focused to a zone-based model is a clear demonstration of the power of flexibility in software architecture. As we continue our journey, we remain committed to embracing change, leading in innovation, and constantly improving our service delivery.

Overcoming Technical Hurdles: Problem Statements Addressed

The shift towards a zone-centric model was an ambitious undertaking that presented several technical challenges. Below, I highlight a few problem statements that we encountered and effectively tackled during this transition, each of which merits further exploration in future articles.

Database Selection for the Use Case: The extensive data traffic generated by the new construct demanded a robust database. After more than a month of rigorous discussions and load testing, we opted for a suitable database considering factors such as cost, maintainability, observability etc.
Resource Optimisation with E2E Reactive Stack: Given the high throughput in non-user pipelines and an enormous amount of data, we introduced Reactive paradigm into our ecosystem for optimal resource usage. This includes inclusion of reactive Kafka pipelines with backpressure capabilities.
Workload Segregation via Priority Pipelines: To ensure permissible propagation Service Level Agreements (SLAs) with active throttling, we designed a priority-based pipeline. There are multiple activities (Product onboarding by large retailers, serviceability change etc) which can create a burst of event in the system resulting in millions of messages in the Kafka. This contention can lead to our critical updates to get delayed resulting in stale information being accessed by the customer.
Full Sync Re-design: In order to sync the data for top listings at the downstream layer, We redesigned our full sync to support ~400 million top listings with Kafka based flow, enabling a throughput of ~1 million per minute per thread at producer side. Now the consumer can throttle the consumption at it’s own pace.
Event Sidelining Design: As single zone-listing event drop may cause the listing data to be stale in that zone resulting in inconsistency in the system, it was utmost important to ensure eventual consistency in the worst-case scenario. To address this we developed an event sidelining design to make sure the listing is in sync with the source.
Intelligent Serviceability Propagation: The communication between ETA (Promise Engine) and Zone Listing Service was improved for a smarter approach for Just-in-Time procurable(JIT) listing creation. As JIT listings are third class citizens compare to actual listings, we had intelligent flows to throttle processes to speed up execution.
Priority Based Full Sync Design: We strategically designed syncing the most revenue-generating listings on demand to handle worst-case scenarios efficiently.
Configuration-based deployment rollout: We devised a method to go live with zone-centric model gradually without deployment. The zone-shift could have hampered the entire business if system would have failed hence the entire deployment was planned without downtime and with finer controls i.e rollback, monitoring etc We were maintaining writing to both the construct i.e supply city model and zone model together for a time being in order to switch if required on the user path.
API Gateway Rewrite**: We fully rewrote the API gateway on our user path interactions for better resiliency, efficient use of go-routines, circuit breakers, multithreading, and day-zero observability.
Deprecation of Outdated Codebase**: We managed to deprecate the Monolith and older APIs, making our system more agile and efficient.

Each of these problem statements represents a significant learning experience that we navigated through as we moved to a more granular, zone-based system. In the following series of articles, we will dive deeper into each of these, offering a detailed view of how we tackled these challenges and the key takeaways from each. Stay tuned to learn more about our journey in pioneering hyperlocal control.

In wrapping up, I’d like to express my heartfelt appreciation to everyone who contributed to the successful realisation of the Zone Listing Service and ETA Engine. Almost all the teams has came together to make this work.

First and foremost, I’d like to thank our leaders Santosh Pawar, Sanjay Kumar, Yogesh Pandey, Vivek Singh, Sankshep Malhotra, Karan Sehgal et.al who trusted in our abilities and provided the opportunities that led to this groundbreaking project.

A special acknowledgment goes to Abhinav Dubey, Aditya Jain, Aditya Garg, Kishalay Kumar Singh, Gyayak Sanghi, Pramod Reddy, Nirav Gokulgandhi et.al whose invaluable insights and dedication have been instrumental in making this intricate system work. The collective effort of our broader team at PharmEasy (Catalog Team and Supply Chain) has also been crucial in reaching this milestone.

** Not directly related to Zone Serviceability but was redesigned or re-architected during the planning.