Using data design to understand your customers (Part 2)

by Tom Matcham, Lusine Tarkhanyan and Stuart George from Method

Method

Published in

Method Perspectives

7 min readJul 26, 2019

You don’t need an extensive data lake to ensure your business is built on real customer insight.

In this article we explain how you can use readily available data sets to help build user archetypes, enabling you to make informed decisions about your brand, product and service strategy.

In our second article on Data-Driven in Archetype creation, we show you how we use data design to cluster potential audience interests to accelerate the creation of Archetypes. This approach works for any public or private data which is graph-based.

Where do we start?

To help us focus our efforts; first, we need to establish the business problem that we are trying to solve: this could be getting a better understanding of the target population, creating a brand strategy, understanding competitor products or related industries. At this stage, design thinking plays an important role in identifying the right seeds. Our designers use qualitative strategic thinking as well as a number of different sources (e.g. surveys, desk research, interviews or client knowledge) to identify the gaps in our knowledge, where to focus and how to formulate the right research questions.

We call these initial interest areas seeds; they are people, hashtags, companies, etc. We use these seeds as key starting points for our data-driven exploration of potential customer behaviours.

For example, in a recent project for a maker of high-end audio devices, we started with their brand name and closest competitors as seeds, but also some adjacent areas of interest adjacent to the brand and sector such as fashion, homes and interiors. We also looked at brands from outside of their sector with a similar heritage/positioning. Knowing the brand as we did, we felt these areas would converge to bring us some interesting populations to focus on.

Once we have the seeds we then crawl the data sources.

Crawling for gold

Crawling refers to a network of bots (a preprogrammed piece of code that runs automated tasks over the Internet) that systematically browse publicly available data sources of the predefined seed and collect information on people who follow or are influenced by the seed. This technique allows us to sample all the users who follow this seed account, identifying the current audience

We then collect their corresponding metadata for each audience member. Mainly we look at what other accounts these sampled users also follow — which we will call Interests. By making a direct connection between a seed and its audience we are able to identify what the current audience of the seed is interested in besides the seed. It’s important to point out, that in the case of public data, no personally identifiable data is crawled, as this infringes privacy laws in some countries. We are looking for areas of general interest and how they connect, not to identify individuals.

This crawling process on average takes two weeks before we move on to the analysis.

Analyse

The analysis is split into a number of parts, some automated and some manual.

For each of the audiences, we build a graph, which is a pairwise relationship between the audience member and their interests. Once we collate the graph representation for each member we group all the member’s interests in such a way that the interests in the same group are more similar to each other (in some way) than those in other groups. The formation of an audience’s interests into distinct groups is called unsupervised graph clustering.

Unsupervised graph clustering allows us to identify groups of audience members who have common interests. By analysing subgroups of those interests rather than the whole audience, we can provide a more detailed picture of the behaviour — at the cost of slightly greater analytical complexity.

To perform unsupervised graph clustering, we have to transform our interest’s graph representation to a numeric vector, i.e. an ordered list of numbers. This is because the vast majority of unsupervised learning techniques take a set of numeric vectors as input. There are many different approaches to this graph transformation, the simplest is to perform binary encoding. However, this representation will be very high dimensional and sparse (it would contain one entry in the vector for each interest in the dataset) and would fail to translate many nuances in the graph structure.

For example, this type of encoding fails to capture relationships like ‘entity A follows entity B and C; and entity B follows entity C’, creating a relationship between A and C that is impossible to represent in this form of graph representation. To solve this problem, we use the graph2vec algorithm.

Graph2vec uses a neural network (an algorithm that is inspired by, but not identical to neurons in the human brain) to embed information about the relationship between the audience and their interests from a graph representation into a vector. The neural network ‘learns’ the graph embeddings at the same time as it learns to predict what sub-structures are likely to co-occur in a graph. We get the numerical vector representation of the graph by extracting the neural network’s internal representation of each graph.

The advantage of graph2vec over other algorithms is that the neural network learns an efficient, compressed representation of graphs specifically from the data itself — the neural network is learning the heuristics specifically for our dataset.

We are looking to produce clusters of audience interests that are similar in some way, (effectively this will become a foundation for further exploration of our archetypes). To produce our clusters, we use the clustering algorithm k-means on our graph2vec graph embeddings.

We then analyse each of the clusters that we discover by looking at the common interests and calculating cluster-level statistics such as average number of post, average number of interests and probability of a member in the cluster having liked a post of one of the seeds.

And finally, we use natural language generation to summarise cluster similarities and differences.

Create the report

Although a lot of analysis has been performed automatically, designers are still required to provide high fidelity insights into the clusters. Clustered metadata is presented to our designers through a web-based tool where they can name clusters, pick representative images and perform further exploratory analysis.

Whenever the designer saves their results in the report tool, a presentation is generated, creating sections for each cluster and slides for high-level graph analysis, an introduction and conclusion.

How do you use these archetypes?

We then bring our clustered interest areas back into the design process to inform the focus of more qualitative research which brings deeper behavioural insight to the mix. We then compile all of these inputs to enable us to create a set of archetypes. Archetypes are a window into the real world, they represent typical real user behaviours and provide realistic insights into the behaviour of your target end-users. They are a common shared reference that the business can align on. They provide a ground zero for decision making across a number of different areas. We use archetypes as a lens to sharpen our ideas, guide the design of products and services, and how you communicate about them. Using data techniques allows us to move faster and at a wider scale. This process gives us both breadth and depth which helps us build better and more representative Archetypes.

Our clients use archetypes as a filter to support decision making within their organisations, testing concepts with interviewees of each archetype to help evaluate the desirability of a concept and to support decision making, at both a tactical and strategic level.

It’s also worth remembering that you don’t need to target your current users; if you are entering a new market or targeting a different segment, Data-Driven techniques can be used to inform decision making around new products or services that fit the needs of your potential customers.

Data Design techniques form part of our design process. We do not use them in isolation. We are creating a powerful set of tools to speed up the design process, increase accuracy and save costs. Leveraging user data and insights to inform business decision is no longer the privilege of the few. We see data as a material that is all around us. With the right tools and focus you can navigate the initial complexity and use it to power innovation, product strategy, brand creation and much more. We believe the deep empathy for the user that design offers can help interpret and bring data down to a human, usable scale.

For more reading as part of this series, see below:

Illustrations in this article are by Daniel Robson.