Creating Hyper-personalized Retail Data Product using Telco CDR

The Challenge

Building a data product that aims to hyper personalize a consumer product offering has always been a focus area for a B2C enterprise. Retail chains have always focused on building data products that drive personalized campaigns with a focus on cohesive message delivery across various points of consumer contact. Tailoring such a data product using near real time location information of the consumer concerned is an ambition for all such data product managers. However location information is not something that the B2C business can get easy access to. Sometimes they do get access to it, but as it is not real time, they are chalenged with limitations to explore all the possibilities of personalization.

The Solution

Snowflake Secure Data Sharing and Marketplace has overcome these limitations and has enabled enterprises to easily share live data in real time in the most secure way.

Overview

We decided to build a data product for the B2C business that will leverage customer demographics information that is available with the business house along with the near real time customer (customer owned, handheld mobile device) location information that is available with the telecom operator with whom the customer has a subscription in that region.

This exercise has two fold benefits: First, it helps the B2C business to run focussed, point in time and targeted campaigns. Secondly, it opens data monetization opportunities for the telecom operator in that region.

Technical Assumptions

  1. The mobile device is in close proximity to the cell tower. Hence the distance calculated is derived from the cell tower to the store location.
  2. The Starbucks menu is the same across all the stores in that region.

The Deep Dive

Every telecom operator has rich information about its subscriber’s location. This is near real time information that is available in the call detail records in the telecom billing system. This information is not publicly available however this can be leveraged by the telco to create insights based on the customers movement. These are valuable insights that can be actioned by the B2C enterprise. There lies a great potential for the telecom industry to monetize these insights and help B2C to enrich its personalized product campaigns.

For our simulation we have considered Starbucks as the B2C partner. The telecom partner is not referenceable as of this writing. B2C has all the demographics and purchase history of its customers. This information can be analyzed to identify a set of loyal/frequently visiting customers and offer them a lucrative deal on a selected product based on their history of purchase and product preferences. Now if we can deliver this offering to the targeted customer just when he/she is near to the B2C outlet, we can definitely increase the likelihood of him/her visiting the store to avail the offered deal. This exercise of delivering a customized offering to the end customer just when he is within a defined drivable distance of the B2C outlet can only be possible if both the parties (B2C enterprise and Telecom operator) come into an agreement to participate in a secure data sharing exercise powered by Snowflake.

Technical Implementation Details

The entire simulation was created on a snowflake instance setup on AWS. Here is a detail process flow diagram for the entire operation :

The Retailer Operations Summary

In our simulation the retailer operations are primarily based on the following data assets:

  • Retail store locations
  • Customer demographics along with loyalty points
  • Customer purchase history
  • Product Catalog
  • Weather data from public API for the region of business

For the product recommendations, we applied a classic Market Basket Analysis algorithm along with third-party dataset integrations. Market Basket Analysis is a data mining technique used by retailers to increase sales by better understanding customer purchasing patterns. It involves analyzing purchase history, to reveal product groupings, as well as products that are likely to be purchased together. We took over 100 million transactions and analyzed them to understand which retailer products could be paired together in order to increase the effectiveness of a promotion sent to a customer. We added variables such as weather conditions and demographics to increase the accuracy of the recommendation.

To deploy an efficient model, we leveraged Snowpark for Python and imported the apriori and association rules modules from the mlxtend (Machine Learning Extensions) python library. We published the results directly in Snowflake in an object, which was shared in the data clean room for the telecom operator consumption.

The Telecom Operations Summary

Sample Call Data Records collected from a mediation system are pushed into an Apache Kafka cluster setup on an EC2 instance. Call events are then ingested into snowflake by using Snowflake Streaming ingestion. Once the json CDRs are loaded into snowflake a series of data engineering operations kicks off, details of which are highlighted in the data flow diagram. The process uses amazon location services to calculate the drivable distance from the nearest cell tower to the store. It is assumed that the distance between the cell tower and the actual mobile device can be ignored given that an average distance between two cell towers in a metropolitan area is less than two miles. Post deduplication from the staging layer, the process filters those phone numbers that are present in the customer loyalty dataset, that are made available from the retailer via the data clean room.

For our simulation we have capped the loyalty points eligible for promotion to 9000 which can be changed on demand. Next, the process calculates the “haversine” distance between the cell tower where the identified call originated and all the available stores that are not beyond 5 kms from the cell tower. The store that is nearest to the cell tower is marked for promotion.

Once this is done, the process calculates the on road drivable distance between the cell tower and the destination using AWS location services. Point to note, the process checks if the distance between this pair of source and destination has already been calculated in the last three hours. If yes, then the calculated drivable distance is pulled from the cache.The aws location service is invoked only upon cache expiration. This cache lookup mechanism prevents aws service throttling.

In the final stage the process picks up the promoted product from the product recommendation dataset shared by the retailer via the DCR for the identified phone number.

The Pipeline Monitoring App

This is a streamlit app that helps to monitor the campaign.

Here is a sample data from the final dataset that the pipeline produces. A well curated message built from this dataset is shared with the end customer via sms or an app notification.

Handling Data Security and Privacy

Sharing of information about the customer and product portfolio with the telecom operator is always a challenge when it comes to security. It has to be through a secured channel and every access pattern needs to be defined and governed. Snowflake data clean room is the best framework when it comes to addressing these requirements. This has been implemented in our simulation. Features like dynamic data masking and role based access policies validate the authorization for every call to the shared data layer.

The customer loyalty dataset also has an attribute that shows the customers agreement to participate in this promotion. The customer may choose to abstain from this campaign. In that case the solution ensures that such a customer is never a part of the promotions..

Future Upgrades and Possibilities

  1. Getting access to handheld device location information. This will help us to calculate more precise driving distance between the device and the store.
  2. Working with a bespoke product menu for each of the stores.
  3. Look up the inventory of each store while promoting the recommended product.
  4. Increase the throughput of the AWS location service to share the promotions quicker.

Contributors to the solution:

  1. Adrian Gonzalez : https://www.linkedin.com/in/adrian-gonzalezc/
  2. Vikash Kumar: https://www.linkedin.com/in/aws-vikash/
  3. Jonathan Tao : https://www.linkedin.com/in/jonathan-tao-55874413/

Please note opinions expressed in this post are solely my own and do not represent the views or opinions of my employer.

--

--