The How and Why of Personalizing Experiences for eCommerce

From a data scientist’s perspective.

Meet a stranger at a party and one way to gain the attention of the person is to talk about things that he or she is interested in. This comes naturally to some of us who can infer interests from just listening to the other person talk or by observing their behaviour.

But most of us are also capable of probing our way into understanding the other person’s likes and dislikes. We ask questions like “What’s your favourite music?” so we can start a conversation based on mutual interests.

Let’s consider something that’s seemingly unrelated — Moore’s law. It’s the observation that the number of transistors in a dense integrated circuit doubles approximately every two years.

And here’s something else: data storage costs have dramatically gone down in the last few years. And we’re all leaving copious digital footprints everyday for the services we use over the internet.

From $437,500 in 1980 to a mere $0.03 in 2014, it’s a steep fall.

With Personalization, There’s Always Room for Improvement

So what do all these observations mean for eCommerce businesses? They make it possible to personalize interactions online.

Everyone wants to get a shopper’s attention. So they create distinctive customer experiences inferred from the data trail that people leave. In fact, personalization is the holy grail of eCommerce companies today.

94% of companies agree that personalisation is critical to current and future success.

And for technology teams, it’s almost become an obsession. Every site you use on a regular basis, from Amazon to Medium to Netflix personalizes recommendations based on your data. And they’re looking for better and better ways to do this.

For instance, Netflix put out the problem as the $1M Netflix Prize for the open community to solve. The winners built a solution that was an ensemble of Matrix Factorization (which the community generally called SVD,Singular Value Decomposition) and Restricted Boltzmann Machines (RBM) which gave them an improvement in their RMSE (Root Mean Squared Error).

But soon, Netflix’s focus moved to a more sophisticated algorithm when people started streaming movies more and more. They started treating personalized recommendations as a problem of scoring, ranking and sorting on a horizontal dimension. Vertically, they began to look at page optimization and filtering.

Right now, personalization is predominantly discussed as a problem of clustering users into homogenous groups and applying rules to these groups, like the prize winners demonstrated. While this provides a good baseline to kick-start a personalization use case, there is also great scope for improvisation.

Looking at users’ intent within the realm of your business rather than who the user is, is one such approach.

Welcome to the world where big data meets small data. Big data is about machines, while small data is about people. Small data knows what the tracked object is doing. If you want to know why, big data could be your answer.

Like Netflix understands, personalization might not fall into a particular bucket. Is it a horizontal/vertical problem or a clustering/ranking problem or a big data/small data problem to solve? Yes.

Insights from the Fashion Space

In the fashion world, trying to understand the category a user wants to shop in, predicting the occasion he/she wants to shop for, likes/dislikes in terms of the colors and patterns are all some of the pieces of the puzzle.

For this example, I’m considering color similarity as a basis for personalization.

Usually the distance between two colors on a spectrum is based on contrast. Eg: black and white are farther apart, red and orange lie closer together. So when you buy two items together, it’s expected that they lie farther apart — you don’t buy things that look too similar, at least at the same time.

But some of our algorithms that detect similarity in the color space for items that were brought together the most, suggest otherwise. The color space shows all the colors present in one particular product (a top or a dress or a shoe) and the width of each color band indicates the proportion of that color in the product (more white than blue).

The figure below shows the color space from two specific tops that were bought together most frequently from one of our clients’ stores. Let’s call them Top 1 and Top 2.

Top 1
Top 2

Similarly, these are the color spaces of the two shoes that were bought together most frequently. Let’s call them Shoe 1 and Shoe 2.

Shoe 1
Shoe 2

Surprisingly, we found that users weren’t buying contrasting items in terms of color from the same category. In fact, the distance between items that are frequently bought together is considerably small — i.e., they look rather similar.

What this showed us was that you can’t rely on traditional ways of thinking. This correlation that works so well for the same category, may fail when we look at cross-category recommendations or even if we look at another website. But again, without looking at the data, we can’t make that assumption.

And this kind of insight relies predominantly on user behaviour and intent, and not just on a grouping based on customer attributes. But it helps eCommerce companies considerably improve matches with recommendations.

The bottomline is, there’s a great payoff in engaging the user through personalised recommendations that cross-sell and up sell effectively without the hard sell. Rating through explicit feedback and inference through implicit feedback are various ways of gauging user intent.

Remember, at the end of the day, all you want to do is connect meaningfully with the new friend you met at the party.


Hariharan is a data scientist who tinkers with code, numbers, thoughts and statistics at Mad Street Den — a computer vision-based AI startup whose cloud-based AI platform offers a wide range of AI products to businesses across the globe in fashion, retail, IoT, robotics, gaming and more.