How to Build a Scalable & Unified Purchase History UX: Design to Implementation

By Joojo Dadzebo Amoquandoh Dontoh

Many a time, the need for a search engine or a similar product (maybe less sophisticated) becomes crucial in settings concerning large amounts of data containing various types of objects, elements or attributes. At airasia, consumers shop on a multi-product super app, purchasing services across the ecosystem. These include flights and hotel bookings, food and grocery delivery, financial services and e-commerce.

Problem Statement

With airasia super app’s rapid product growth beyond airline tickets, consumers may have purchased and continue to purchase these various new services. However, we begun to realise that the absence of a unified purchase history (UPH) meant that users had no means to track their past purchases in a consolidated way.

How do we enable a user experience whereby users can easily access and navigate their past purchases across all services?

This situation posed an engineering problem of constructing and implementing an architecture that enables access through all purchased items at hyper-speed, notwithstanding the progressive growth of data, hence the need for impeccable scalability.

Scalability-by-design is a crucial factor for us because implementation across all lines of businesses (LOB) is a gradual yet expensive process which cannot be halted due to data magnitude and growth.

Design/Architecture

It is important to note that keeping a source of reference or documentation while deciding on an architecture is very consequential, as the process is usually an iterative one that may involve trials and adaptations, especially as intel is gathered on the go. As mentioned earlier, implementation across various lines of businesses is gradual so we started with the unification of all sources for flight & flight+hotel booking data.

The tech stack consists of:

  • Google Cloud Platform (GCP)
  • Google Cloud Functions
  • Google Kubernetes Engine (GKE)
  • Google Pubsub
  • Nodejs
  • Google Firestore

The cloud infrastructure provider at airasia is Google Cloud Platform (GCP). One of the biggest selling points of GCP is the Kubernetes engine. Being one of the best engines out there, we have been able to leverage its technology to deliver extensively scalable and maintainable systems over time. With all these, the final architecture of the system looked something like below:

Preliminaries

Though Cloud functions are not part of our core system, it was used initially to monitor incoming data from the respective topics that pushed data upon user actions such as, booking creation and updates. This was an important step to both review the payload and measure the average frequency of incoming data, as such intelligence informs the engineers on how to write efficient endpoints.

Core

The core consists of the topics, the subscriptions and the service. The idea during the architecture and design phase was to create an event driven strategy that operates on little to no supervision which both feeds and updates the data store. This resulted in a pub/sub strategy where there is a subscription to all topics connected to the data sources.

Firestore

It is a common occurrence to have bottlenecks at the database level, therefore utilising the hyper-lookup speeds of Firestore is very crucial to the success of this project with regards to performance. Firestore is also well suited for infinite scalability.

The main design patterns involved in this project are both the observer design pattern and the adapter design pattern. These patterns were most suitable for this project because, at a fundamental level, information is received based on triggered events which utilise event-driven ideologies, while being digested for storage purposes using a layer of adapters.

In finer detail, the chosen architecture was designed to accommodate numerous sources of data publishing to respective topics which is then pushed via subscription to an adapter layer, which then digests the data into storage. Though most of the architecture hinges on event based processing, it also has a data pull engine, which can pull data segmented by dates - works well for historical data as well as a backup process just in case the pub/sub based flow fails.

Implementation Sneak Peaks

Since this is a multi-team effort, initial attempts to commence implementation begun with various discussions among stakeholders (engineering and product managers, leads, owners and team engineers), to decide on a data model that is reasonably generic yet expandable to include most forms of order items. This happened alongside considering the most important data points from various sources.

Schema design

The conclusion as a result of iteration for the data model, was to divide the model into three main aspects namely ownership, core and metadata. The ownership chunk creates the human relation to the whole data object by housing data in connection to the consumer such as various IDs, names and emails. The core section of the model handles data directly connected to the purchased item. This section has the most flexibility in terms of attributes and significance. This flexibility allows for extensive data evolution and growth. Finally the metadata section, as the name describes, carries attributes that provide information about the core data such as its source, channel and provider. This data scheme allows for a structured yet amendable core.

Dataflow & Pipeline

Fundamentally, data is siphoned into the system via an event-driven strategy. All data sources have a dedicated pub/sub topic where data, whether created or updated, is published.

The ingestion engine of avalon subscribes to these topics waiting to be invoked in the form of a webhook push. The data then goes through a transformation phase to be adapted as the expected schema definition. Data transferred to this webhook, can be json or xml.

The avalon engine requires for every data source to have a pub/sub topic where it publishes data, with a corresponding adapter on avalon’s side to digest the incoming information.

This ensures that data source integration does not subject the code base to modification, but rather opens it for extension by the addition of new models or implementation.

Following basic design processes, we plugged an endpoint into a subscription within our staging environment to monitor data entry into firestore while making needful tweaks. Importantly, to prevent duplication we made sure to check for data existence before insertion while ensuring that all updates were properly overwritten. As part of our precautionary measures, raw data (data that has not been passed through the adapter layer) is stored in another collection for backup purposes.

Lookup

Items are most commonly retrieved by user identification with the needed pagination markers. However, the proposed model allows for data query based on individual or a combination of various attributes such as, the item type, the provider (source), status, customer names etc. The ultimate goal for this engine is to provide the ability to retrieve items of different types as long as all the needed attributes are present. The system is also built to handle more complex queries in coming days. Here’s a sample query, that returns last 10 booked flights via our platform, for a particular user:

/avalon/purchased-items?page=1&limit=10&itemType=flights&locale=en-gb&provider=source1

Challenges

1. Cross Team Collaboration

One of the challenges encountered in this project was the coordination of various teams which took immense time management, cohesion and understanding. Switching data sources on front-end had to be done carefully to prevent oversharing and under sharing of data. This means the involvement of not just tech leads and program managers, but engineers in charge of various modular aspects of the system. Deciding on a stable data model was also a collective effort.

2. Scalable Schema Definition

Putting together the foundation for a schema that is scalable and progressive enough posed a challenge, especially noting how important it is for data retrieval and lookup. Since data is both queried and ingested by and from different data sources respectively, the schema design had to be generic yet descriptive enough.

3. Historical Data

As part of the initial intention, the architecture has to be open enough to accommodate other forms of data transfer. This posed the challenge of avoiding over-engineering while keeping a minimalist architecture. Proof of a minimal architecture as can be seen from the architecture diagram above is that, its simplicity and minimalism especially with respect to a processing unit allows for a pull mechanism in terms of data transfer (highlighted in blue). The pull mechanism was designed to handle the challenge of syncing historical data with the central data store. It simply retrieves the data from the source and passes it through an adapter layer preparing it to be stored along with recent data. Moreover, this alternate process can be a lifesaver if the event driven flow ever fails. More to come on this in a subsequent article!

Conclusion

Though proper architecture was put together for this project, it is worthy to note that many changes and improvements may happen along the line. To mitigate the effects of these changes both on the team and on the project, it is vital to maintain close and consistent communication between all stakeholders through daily standups and meetings. These meetings help with suggestion scrutiny, feasibility analysis, priority assessment and time management. The general idea around this project is to develop and improve the experience based on the initial design, therefore change is always welcome.

If you found this article helpful and applicable for your organization, don’t forget to click the 👏🏾 sign multiple times, leave a comment and share this article with your friends 🤟 See you in the next one! 😉

--

--