MTH Data Store Selection Journey

Akhil Kohli
Walmart Global Tech Blog
4 min readMay 13, 2019

Introduction

Member Transaction History ( MTH ) is an omni channel read layer for club / online orders. It serves as single source of truth for online / club orders. Below are few use cases supported by MTH:-

  • To provide a reliable and quick access of the order data using a canonical order model.
  • Single source for all Order Lookups & Modifications while providing a consistent interface.

Use Cases :-

  1. Get Order Details based on orderID
  2. Get Order Details based on phoneNo or emailID
  3. Get Order Details based on membershipID and Data Range / itemNo
  4. Get Order Details based on phone number or emailID and OrderStatus and Date Range
  5. Get Order Count based on user details / date range ( aggregation )

Data store selection journey

Member Transaction History use cases at top level are divided in 4 broad level categories:- Key Value Use Cases, Filter Pointed Query Use Cases, Pre-defined Complex aggregation Use Cases, Field value faceting / search / dynamic query Use Cases.

  1. Key Value Use Cases:- MTH supports various key-value look up use cases like retrieve Order Details based on OrderId / retrieve Return Details based on ReturnOrderId. A key/value store is essentially a large hash table. You associate each data value with a unique key, and the key/value store uses this key to store the data by using an appropriate hashing function. The hashing function is selected to provide an even distribution of hashed keys across the data storage. Most key/value use cases only support simple query, insert, and delete operations. To modify a value (either partially or completely), an application must overwrite the existing data for the entire value. In most implementations, reading or writing a single value is an atomic operation and offers low latency. We evaluated various Azure data stores like Azure Redis / Cosmos / SQL / Cassandra. Based on our performance test, we ruled out Azure SQL as it does not scale. Azure Redis is in-memory data store and works better for caching use cases. We decided to go ahead with Azure Cosmos DB. Azure Cosmos DB is globally distributed, horizontally scalable, multi-model database service and provides low latency for key value look ups.
  2. Dynamic Query ( Faceting / GroupBy / Filtering Use Cases ):- MTH supports various dynamic query patterns like retrieve Order count grouped by Ship Node / Carrier Method ( parameters can be defined by caller ). The key characteristics of a search engine database are the ability to store and index information very quickly, and provide fast response for search requests. We evaluated Azure Cosmos / Azure Search / Azure SQL for this use case. Based on our performance test, Azure SQL does not scale as it is relational data store. Azure Cosmos currently does not support group-by / facet pivoting. We decided to go ahead with Azure search as it supports all the features required for dynamic querying. However, based on performance test we observed high latency with Azure search and are in talks with Microsoft team to enhance it.
  3. Analytics / Complex Aggregation Use Cases:- MTH supports various analytics and complex aggregation use cases which requires streaming. It deals with streams of data that are captured in real-time and processed with minimal latency to generate real-time (or near-real-time) analytical reports. We evaluated Azure Databricks / Azure stream analytics for the above use cases. We decided to go with Azure Databricks as it is fast, easy and collaborative Apache Spark–based analytics service. It supports Python, Scala, R and SQL, as well as deep learning frameworks and libraries like TensorFlow, Pytorch and Scikit-learn.
  4. Filter Pointed Query Use Cases:- MTH supports various filtered query use cases like retrieve Order Details based on MembershipID / First name / Last name / Date Range in a paginated way. We are querying on meta data for these use cases. We evaluated Azure Cosmos / Azure Search / Azure SQL for above use cases. Based on our performance test, Azure SQL does not scale as it is relational data store. Azure cosmos does scale and latency is low for pointed queries, however the pagination feature is yet to be rolled out. Azure search does provide us required functionality( pagination / faceting etc ), however latency is high compared to Azure Cosmos. For now, we have decided to go with Azure Cosmos for the pointed filter query use cases as we require low latency.

Based on our observations, we can conclude that there is no one datastore which caters to all solutions. Each datastore comes with its pros / cons. We need to consider various factors while selecting the right datastore:- Data format, Data size, Scale and structure, Data relationships(one to many / many to many), Consistency model, Schema flexibility, Concurrency, Data lifecycle( hot / cold data collection ), Region availability, Auditing, Throughput, Cost effectiveness, Security. These factors play a crucial role as we decide to go ahead with Key Value data store, Document data store, Column based data store, Graph based data store or Search based data store.

--

--