Customer Segmentation:
Taking a Page out of the Computer Vision Book

Published in

Zero One Group

7 min readJun 11, 2020

By Anthony Khong, Founder and SVP of Data Analytics at Zero One Group

In recent months, Zero One Technology has been working with Stream Intelligence on a project to do, amongst other things, automated data-driven customer persona generation for one of Indonesia’s retail giants. A major part of the project is to come up with an insightful and actionable segmentation of the customers.

To achieve that, we made use of an algorithm that was originally designed for computer vision and topic modelling called the non-negative matrix factorisation, or NMF in short. In this tech blog, we give a brief account of how we adapted the algorithm to come up with customer segments.

Why Customer Segmentation?

Customer segmentation refers to the process of dividing the customer base of a company into groups, which share common demographical or behavioural characteristics. Understanding the different types of customers is important for formulating a coherent set of targeted strategies, such as brand positioning, one-to-one marketing and targeted individual recommendations. On some occasions, it has even been used to identify missing segments in the customer portfolio, and launch an aggressive acquisition campaign (Groysberg, 2018).

An example of a persona profile by Salminen et al. (2018)

Customer segmentation can also be used as a foundation for customer persona generation. Defined as “a fictitious person representing an underlying customer or user group”, customer persona crystallises a specific segment into an archetype often with concrete visualisation, background stories and transactions with the retailer. The main benefit of such an approach is to provide a realistic, shared mental model of the different types of customers for key decision makers. However, the generation of personas is not without its criticisms, which are usually about its lack of verifiability and actionability. See Salminen et al. (2018) for a more detailed account.

We are at present witnessing the advent of mass collection of rich datasets such as application engagement statistics and shopping behaviours, coupled with the increasing availability of online analytics data such as social media sources. Automated, data-driven customer segmentation is fast becoming readily available for many retailers and tech platforms enabled by a wealth of rich datasets. It is often the low hanging fruit and the first step towards understanding their customers better.

Non-Negative Matrix Factorisation

Lee and Seung first introduced the NMF algorithm in 1999 to learn parts-based representations of objects, which concretely referred to facial features of face images and semantic features of text.

Just as in other matrix factorisation techniques such as principal components analysis (PCA) and independent components analysis (ICA), the NMF algorithm decomposes a matrix into two smaller non-negative matrices, which approximate the original matrix when multiplied:

An illustration of how the NMF algorithm learns canonical facial patterns and individual maps from facial images.

Taking the example of facial images, the NMF algorithm decomposes the many available facial images into individual facial-pattern maps and shared canonical facial patterns. Unlike learning eigenfaces through PCA, the NMF only allows non-negative additive components, so that each canonical facial pattern resembles actual parts of the face such as eyes, noses, mouths and chins. An important characteristic of the individual facial-pattern maps is that they are sparse. This means that each individual map discards most facial patterns, so that each face is a combination of only a select few facial patterns.

But what makes NMF suitable for segmentation, and not PCA? To answer this, imagine having to segment people based on their facial images. The algorithm discovers a specific canonical facial pattern, say, a particular kind of nose, and identifies every person with that nose. This process essentially groups a number of people and forms a similar-nose segment. By repeating the process for all the canonical facial patterns, we obtain an overlapping set of segments based on shared facial features. There are no parallels of this process to PCA, precisely because PCA allows for negative and non-sparse individual maps. For what does it mean to exhibit a particular eigenface, only to be subtracted by other eigenfaces?

Customer Segmentation using NMF

Fast forward two decades later, An et al. (2018) makes use of the NMF algorithm to segment viewers of AJ+ YouTube channel. Instead of using pixels of an image, they use views of specific videos to construct the NMF matrix. Using six canonical behavioural patterns, the top viewer groups of each behavioural pattern show clear separation in terms of location, age and gender.

With this study in mind, it is straightforward to see how the technique can be translated into customer segmentation in the retail industry. Instead of pixels of an image or views of specific videos, we can use shopping behaviours that are commonly available for many retailers. In particular, we can use total spends of different products, categories, timing and payment methods to construct our NMF matrix.

A high-level summary of how to carry out behavioural segmentation from transactions data.

To interpret what the algorithm does, it bundles co-occurring shopping behaviours into separate sets to form canonical shopping patterns; very much like how it discovers a kind of nose, eyes, chin, and so on. Each customer can thus be summarised by a sparse behavioural map, which represents a combination of a few shopping patterns.

Shopping Patterns: the Bridge between Customers and Behaviours

One way to think about the canonical shopping patterns is that they link every customer to their respective segments, and shopping behaviours. There are a number of ways in which this relationship becomes useful.

For example, suppose we would like to categorise the customers into specific segments.

We would simply look up each customer’s top shopping pattern (i.e. an argmax operation):

Another example is to summarise a particular customer’s shopping patterns. We first look up, say, the top three shopping patterns and then look up the top behaviours of each segment.

Finally, to identify cross-selling opportunities, we may identify the associated shopping patterns from a base product. From the shopping patterns, we are able to identify the other associated products and the customers that significantly exhibit those shopping patterns:

Final Thoughts

It has never been so easy to access a large amount of customer data in retail. Some retailers even unknowingly sit on a treasure trove of raw data that can be handled, analysed and shaped into actionable insights. We have demonstrated one example where we have provided such insights to our client.

Although we have been focusing solely on retail applications, the NMF algorithm has been applied to a wide range of applications in the past decade. For instance, it has been used for movie recommendations, community discovery and even hyperspectral unmixing. It has proven to be a versatile pattern-discovery algorithm!

NMF is by no means a fancy state-of-the-art machine learning algorithm. It gets the job done where it needs to. In Zero One, we are constantly training our teams to simplify our solutions, whilst maintaining rigour and keeping our eyes on the business needs. Simple and effective, that’s how we like it!

Resources

Almohri, Haidar, Ratna Babu Chinnam, and Mark Colosimo. “Data-Driven Analytics for Benchmarking and Optimizing Retail Store Performance.” arXiv preprint arXiv:1806.05563 (2018)
An, Jisun, et al. “Customer segmentation using online platforms: isolating behavioral and demographic segments for persona creation via aggregated user data.” Social Network Analysis and Mining 8.1 (2018): 54.
Gillis, Nicolas. “The why and how of nonnegative matrix factorization.” Regularization, optimization, kernels, and support vector machines 12.257 (2014): 257–291.
Groysberg, Boris, and Annelena Lobb. “California Closets: Organizing the Customer Experience.” (2018).
Lee, Daniel D., and H. Sebastian Seung. “Learning the parts of objects by non-negative matrix factorization.” Nature 401.6755 (1999): 788–791.
Salminen, Joni, et al. “Generating cultural personas from social data: a perspective of Middle Eastern users.” 2017 5th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW). IEEE, 2017.
Salminen, Joni, et al. “Are personas done? Evaluating their usefulness in the age of digital analytics.” Persona Studies 4.2 (2018): 47–65.
Salminen, Joni, Soon-gyo Jung, and Bernard J. Jansen. “The Future of Data-driven Personas: A Marriage of Online Analytics Numbers and Human Attributes.” 21st International Conference on Enterprise Information Systems, ICEIS 2019. SciTePress, 2019.
Yang, Jaewon, and Jure Leskovec. “Overlapping community detection at scale: a nonnegative matrix factorization approach.” Proceedings of the sixth ACM international conference on Web search and data mining. 2013.
Zhang, Sheng, et al. “Learning from incomplete ratings using non-negative matrix factorization.” Proceedings of the 2006 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2006.

Follow Zero One Group at Instagram, Twitter, Facebook, and LinkedIn. Visit our website at www.zero-one-group.com

Customer Segmentation:Taking a Page out of the Computer Vision Book