Identifying behavioral personas with cluster analysis

Kadri Lenk

Published in

Pipedrive R&D Blog

10 min readMar 29, 2022

A practical guide into understanding behavioral personas and detecting them

tl;dr

Behavioral personas describe your customers based on their interactions with your product.
Identifying your behavioral personas helps you to better understand your users and how well your product meets their needs.
At Pipedrive, we used product usage data for clustering. The algorithm we used was non-negative matrix factorization (NMF).
We have identified 7 behavioral personas, which we use to analyze feature usage, target users for product research, augment other analyses and more.

Intro

In this article, we’ll review behavioral personas and explain what they are good for and how they differ from traditional buyer personas.

We will also walk you through how we at Pipedrive have identified our behavioral personas, what data we used, which algorithm we picked and why.

Finally, we’ll introduce you to Pipedrive’s behavioral personas.

If you are new to behavioral personas and clustering, this article should serve as a practical guide into understanding their necessity and tools to identify them.

Let’s dive right in.

Do you know your users?

Knowing your users is key to a successful user experience. Understanding their goals, wants and needs lets you design a product that your customers will love using.

There are numerous ways to gain insights into users, including market research, user interviews, surveys, analytics and more. Combining these tactics should provide you with a solid overview of your user base.

One typical output of such research is personas. However, combining the data you have on your users’ interactions with your product would result in an even more powerful variation of the traditional persona: the behavioral persona.

What are behavioral personas?

Traditional user personas (also known as buyer personas) are semi-fictional depictions of your target customers. These personas encompass the key traits of your audience like demographics, interests, motivations and more.

They are mostly based on qualitative data like user interviews, surveys and market research. These personas tend to be broad and “soft,” as the relationship between them and your product may be indirect. Nevertheless, they are a valuable asset that helps understand people’s goals, values and attitudes.

Behavioral personas describe your customers based on what they do: their interactions with your product rather than demographics or market data. These personas give you a clear picture of your existing customer base–what groups of users it consists of and how many there are in each group.

Differences between the traditional persona and the behavioral persona

What are behavioral personas good for?

Understanding how different groups of users engage with your product helps you design it to provide the best possible value and experience for your users.

For example, in Pipedrive’s case and as the product is a sales CRM, it’d be reasonable to assume that the majority of our users are salespeople. But are all the salespeople using Pipedrive the same way? What about sales managers: What do they value in a product? And what other sales-supporting roles does Pipedrive serve?

Knowing the answers to these questions should help us to better understand our users and how well our product fits their needs.

Benefits of behavioral personas

Compared with traditional customer personas, behavioral personas offer some significant benefits yet some potential drawbacks.

Benefits of behavioral personas:

Quantifiable–Since you assign a persona to each of your users, you’ll know exactly how many representations of each persona exist in your user base.
Cost-effective–Unlike traditional personas, behavioral personas don’t require several people to conduct weeks of research. Depending on your data, tools and skillset, it’s possible to have the preliminary results ready within a day.
Scalable–Once you have identified the personas, you can continuously apply the model to new users.

Things to keep in mind:

You must have data first–There is no way you can identify behavioral personas if you don’t have data on your users’ behavior. Event tracking, web analytics, logs–anything helps.
Leaves room for interpretation–Two users with the same persona could have very different roles and responsibilities in real life. For example, at Pipedrive, “deals” and “activities” would usually describe the behavior of a salesperson. However, it could also fit the behavior of a different user who uses Pipedrive for project management.
Descriptive–Since the behavioral personas are based on past behavior, you can’t extract any data about their lacks and desires.

How did we identify the behavioral personas at Pipedrive?

The data

At the heart of identifying behavioral personas lies the data.

Depending on your situation, the data you have available may vary. In Pipedrive’s case, we are leveraging product usage data.

Product usage data is the data produced by your users that interact with your (software) product. It provides a quantitative measurement of how, when and where your product is used to understand its performance.

Read more about product usage analytics in Pipedrive:

Product usage analytics at Pipedrive

How Pipedrive set up product analytics and usage tracking process

medium.com

At Pipedrive, we continuously track the interactions of our users with our product. We call these interactions events. An event is triggered when a user clicks on an item (usually a button). Having this event data provides us with a decent overview of our product’s performance, user engagement and more.

Another use case for event data is detecting behavioral patterns and, ultimately, behavioral personas. To identify these personas, we decided to use a machine learning algorithm. More specifically, we went with non-negative matrix factorization.

Note: To identify the personas, it is enough to include only the key data that covers the most fundamental interactions with your product. At Pipedrive, for example, we have hundreds of different events in place, whereas for personas, we only took into account 40 key ones.

Non-negative matrix factorization

Non-negative matrix factorization (NMF) is an unsupervised machine learning algorithm for finding two non-negative matrices W and H whose product approximates the non-negative input matrix V.

This factorization has multiple applications like dimensionality reduction, text mining, image processing and more.

NMF also has an inherent clustering property where the number of components that the model inputs is also the number of clusters you’ll get.

How exactly does clustering with NMF work?

Let’s delve into the below process.

Illustration of NMF: The matrix V is represented by matrices W and H, which, when multiplied, approximately reconstruct V.

The input matrix V is a matrix where each row represents a user, each column represents an event, and the values are the counts of events triggered by the corresponding user in the observed time period.

Another input you the NMF requires is the number of components (k), which is also the number of personas you’ll get.

After fitting the NMF model to the matrix V, we’ll receive matrices W and H.

Factor matrix W will provide us the cluster of each user, which we can determine by extracting the argmax–the index of the maximum factor value for that user.

For example, if we run NMF with 5 components (i.e., 5 clusters) and matrix W factors for one specific user are [0.0041, 0.2729, 1.8886, 0.0291, 0.9963], then the cluster for that user is 2 because 1.8886 is the maximum value in that array and its index is 2 (counting starts from 0).

Factor matrix H will give us the most dominant events for each cluster, so you’ll know what each cluster is about. We can determine these events by sorting the factor values and mapping the corresponding indices back to event names.

For example, if we had 7 events in our input matrix, then the factors for one particular cluster in matrix H could be [5.5902, 1.9799, 5.4623, 51.133, 2.0531, 11.4006, 37.7517]. If we sort this array and replace the values with their original index, we get [3, 6, 5, 0, 2, 4, 1]. This means that event 3 is the most dominant in that cluster, followed by event 6, then 5 and so on. By looking at the most dominant events in each cluster, you should get a good understanding of each of the clusters.

Note: In case you are interested in implementing NMF in Python, you can find it in the scikit-learn library.

How to know what the best number of personas is?

As with other clustering algorithms, NMF requires you to provide the number of components (personas) as input. But how to choose this number?

At Pipedrive, we used a simple approach: trial and error. We ran the algorithm with a different number of components and established what made sense the most.

To identify actionable behavioral personas, you should find a good balance between not too broad and not too granular.

Cluster users into two groups, and you might miss some other relevant personas that emerge when increasing the number of clusters. On the other hand, having a higher number of personas where one of them makes up <1% of your user base is probably impractical and not actionable.

I encourage you to play around with the data, see what works and what doesn’t, learn and iterate.

In case you are interested in a more data-driven approach, a common way of detecting the most optimal number of components for NMF is to use cross-validation. Here is a nice illustrated tutorial on how to do this.

Why NMF?

We did explore other clustering algorithms as well (including k-means, the holy grail of unsupervised learning). However, NMF consistently produced superior results as it grouped users more evenly and the distinctions between the groups were more obvious.

The main benefits of NMF in behavioral personas clustering:

Robust to outliers–Even if some users have oddly high volumes for some events, NMF doesn’t skew the clusters towards unusual patterns.
Handles high-dimensionality–NMF can capture the uniqueness of each cluster, even if the number of different events as input to the algorithm is exceptionally high.

Note: In case you are interested in learning more deeply about NMF, please see this article.

Personas at Pipedrive

At Pipedrive, we have identified 7 distinct clusters of users where each cluster or persona can be characterized by their distinct behavior.

The personas and their relative size are illustrated below.

Pipedrive’s 7 behavioral personas and their relative proportions

Who are Pipedrive’s behavioral personas?

Sales–all the main actions are typical of a salesperson: advancing deals in the pipeline, closing them, planning activities, sending emails, etc.
Data Entry–higher than average activity in adding various entities like deals, contact persons and organizations
Contact Manager–mainly focused on working with contacts
Admin–top actions are all related to administrative work: inviting new users, customizing pipeline, managing billing, etc.
Mobile–using our mobile platform (either Android or iOS) as their primary tool
Lead Manager–mainly focused on working with leads
Team Lead–very active in using Pipedrive’s Insights feature (reporting and goals) and working in the deal forecast view

Please note that the behaviors of the personas are not mutually exclusive. For example, it does not mean that our mobile platform is only used by the Mobile persona. It’s just that the Mobile persona uses the mobile platform much more frequently than other personas (while, at the same time, using our web platform much less frequently than other personas).

Use cases for behavioral personas at Pipedrive

Below are some of the main use cases where we have benefitted from knowing our behavioral personas.

Getting to know the users

When we first ran the analysis to identify Pipedrive’s behavioral personas, we treated it as an exploratory data analysis. It was exciting to learn about the behaviors of different types of salespeople as well as other personas who have a supporting role in the sales process. Knowing the behavioral personas has helped us to gauge how well our product is serving the needs of our target audience.

Analyzing feature usage

Not all features that we build are targeted at 100% of our user base. For example, when we build something that is aimed at the managerial role, we can measure the performance of this feature in the context of the target persona.

Targeted user research

When conducting product research, we try to reach out to those users who we think are the best people to talk to. Identifying users’ personas has helped us to better target the people for more high-quality research.

Inputs to other analyses

We have used the personas as an input for other analyses. For example, when predicting a win probability of a deal, it helps us to reach more accurate results when we take into account the persona of the user who’s working on that deal.

Continuous detection of new roles

As we are expanding our product offering, we keep an eye out for emerging new personas. For example, since the launch of Pipedrive’s lead management software, we not only see the growing usage of it but also see that many of our customers have a dedicated role for managing leads. As we continue to release new products, we can estimate whether we are successful at onboarding new roles to Pipedrive.

Conclusions

User research is sometimes seen as a time-consuming and expensive process–but it doesn’t need to be.

With behavioral personas, you can get to know your user base quickly and affordably. Needless to say, this process shouldn’t replace that of talking to customers. Still, it offers a data-driven approach to learning who your customers are.

Interested in working in Pipedrive?

We’re currently hiring for several different positions in several different countries/cities.

Take a look and see if something suits you

Positions include:

Director of Product Analytics
Senior Business Analyst
Engineering Data Analyst
Product Designer
Product Manager
And several more