An introduction to Human Centred Data Science and what it means in practice (pt. 2)

Na Xiao Shuang
Synthesis Partners
Published in
7 min readOct 1, 2020

Click here for part 1 of the Intro to Human Centred Data Science series.

Here at Synthesis, we call our approach to understanding data Human Centred Data Science.

In simple terms, Human Centred Data Science refers to understanding data by understanding the humans and the context shaping it. Recognizing the human context lets us better understand which data to look at, and how to interpret it.

But what exactly does Human Centred Data Science mean from a technical perspective? How does it differ from regular data science? What are the steps and processes we use to achieve it?

In the second part of this introduction to Human Centred Data Science, I’ll be laying out the theoretical framework behind our approach, and taking you step by step through it.

HOW WE DO IT

We split the Human Centred Data Science framework into four phases: Frame, Layer, Iterate, and Narrate.

1. FRAME

The framework begins with understanding the context of questions, key audiences and final use cases.

We often get excited about experimenting with new methodologies and exploring the possibilities within data, but it’s crucial to first define the problem by asking the following:

  • What are the key needs?
  • Who will be using the information?
  • What are they planning to do with the findings?

A clear understanding of the problem sets us up for the whole process, allowing us to find impactful insights and actionable solutions.

Example: Understanding Natural Beauty in India

For our Understanding Natural Beauty’s Second Homecoming in India piece, the team wanted to explore how the rising global interest in natural beauty played out in the Indian context — where natural beauty was more norm than trend. The desired output was a content piece written for global beauty brands, identifying opportunities to play in this interesting new context.

2. LAYER: Clean, accurate, simple and smart dataset

Human Centred Data Science analysis doesn’t need the biggest dataset. Instead, it relies on building a relevant and accurate dataset that can reveal interesting human truths, often where no existing dataset exists.

Selecting and layering data sources

Everyone has different intentions on different platforms (Image from Doug the Pug)

When choosing a relevant data source, we consider:

1. The behavioural insights it provides. For example,

  • Search: implicit interest
  • Reviews: explicit feedback
  • Social Content: what and how do people share
  • Conversations: what and how do people discuss

2. Target audience for analysis (eg. mainstream consumers vs. niche groups)

3. Category (e.g. gaming, beauty, food, sports)

4. Market (e.g. India, Japan, China)

It is important to realise that each data source is imperfect and has its own limitations and biases. Platforms may be skewed towards certain groups of users (e.g. by age), certain user intentions, or be used differently depending on cultural context in different markets.

Layering data from a variety of sources (e.g. e-commerce + search + social media) is a core aspect of Human Centred Data Science. It helps you to embrace imperfect data to get a holistic understanding of the topic.

Example: Understanding Natural Beauty in India

We decided to layer e-commerce with search data to gain a holistic perspective of the natural beauty category in India. E-commerce and search both reflect and dictate consumer behavior. On one hand, search interest reflects consumer curiosity around natural beauty, e-commerce data reflects the consumption of natural beauty products

Keeping the market and topic interest in mind, the team selected nykaa.com, a leading Indian e-commerce platform. Also, Nykaa (unlike global platforms like Sephora) makes ‘Natural’ a category of its own, allowing us to zoom in directly to relevant products.

Layering these behavioural datasets revealed an interesting contradiction. Even though Indian search interest in natural beauty was flat, natural brands still dominate the category in terms of quality and volume of reviews on nykaa.com.

Our hypothesis that natural is the norm was validated — people are not actively searching for it, but they expect natural ingredients and benefits in all the products that they buy.

Building datasets

In building a data set, we take these steps:

  1. Identify interesting information and variables for data segmentation and analysis. Also, questioning what’s not available in the datasource
  2. Capture period of interest (eg. current vs changing landscape)
  3. Get to a more accurate dataset by:
  • Making sure it is related to the topic of interest (minimal noise in the data)
  • Narrowing down to target audience for analysis using platform-specific behaviours

Example: Understanding Natural Beauty in India

We wanted to identify the products people were most engaged in and excited about right now. To do this, we calculated a product quality score based on product rating and number of reviews to highlight the high-rated products at each price point and using it to evaluate brand performance.

Rating scores were normalized before feeding into the quality scoring algorithm. People rate emotionally, not quantitatively and are more likely to rate a 5 or 1 star compared to a 3 star. Since the natural category is more loved than hated in general, all consumer ratings in our dataset were in the range of 4.2–5 stars. Hence, we normalised rating scores to be able to spot the products that resonate the most within the natural category.

It’s important to make sure the dataset is clean before proceeding to further analysis. Looking at outliers and anomalies in the data is a great place to start cleaning our dataset. For example, the team spotted some products from more mainstream brands appearing in the luxury section. We realized these were combo packs, not individual products, and cleaned the data to remove these bundled items.

3. ITERATE: Iteration of good analysis

At the heart of Human Centred Data Science are multidisciplinary thought experiments that use contrasting perspectives to generate hundreds of observations. By continuously iterating, we push analysis beyond the descriptive by asking ‘why, why, why?’

Iterating starts from generating different observations by cutting the data from different angles and comparing to our hypotheses to understand what is happening.

  • Does it align with our initial hunches?
  • Are there any surprises?

Next is to find out out why it’s happening:

  • Who is behind the data points? Was it due to cultural preference or platform behaviors?
  • What are the motivations and drivers?
  • Is the dataset clean enough and do we need to go back to the cleaning process and refine the dataset?

Lastly, refining our insights to reduce human and data bias:

  • Should we look at it from a different angle?
  • What is missing? Are there any additional data sources to support/ explore it?

Example: Understanding Natural Beauty in India

Next step was to understand what are the cues, benefits, and ingredients that make a natural product successful in this market. The team discovered interesting patterns in the data that led to new insights about the category and validated initial hunches of our category experts:

  • Local vs International packaging colour story: International more earthy, understated, cueing luxury with/without cueing natural. Local brands in more vibrant, exciting colours
  • Ingredients across price points: super-premium and premium brands pair local Indian ingredients (eg. Honey, Sandalwood, Coconut and Turmeric) with luxe packaging, formats and quality formulations to differentiate vs. cheaper offerings.

4. NARRATE: Data driven storytelling and strategies for impactful actions

The last step of the process is to convey the findings to our audiences with simple yet powerful data charts. Through compelling story-telling supported by effective visualizations, we back up the story for key audiences to confidently take action.

In the process, we try to:

  • Understand our audience and how they interpret data to come out with easy to understand and concise data explanations.
  • Give meaning to numbers by explaining them in relation to the problem and the audience, and providing useful benchmarks to contextualize the significance of data points.
  • Bring findings to life with meaningful stories & examples that provide actionable insights
*each bottle represents 11 products on Nykaa.com

Example: Understanding Natural Beauty in India

The team decided to lead with the breakdown of international vs. local brands at each price point, contextualized using culture.

A product grid with icons was used to represent 8*20*11 = 1760 products at a glance, using colour overlays to represent the breakdown of international and local brands at different price points.

Using cultural context lets us provide actionable insights for brands. There has been a surge in interest in traditional beauty rituals and ayurvedic practices, as seen by the explosive rise of natural brand Patanjali in 2015. Looking at the the Super Premium category, it becomes clear that winning products combine luxury with traditional ingredients and evocative packaging.

Conclusion

These steps — Frame, Layer, Iterate, Narrate — form the theoretical framework of Human Centred Data Science we use to approach problems.

We’d love to hear how you approach human-data problems and the steps you use, as we work on refining our own processes and methods. Let us know in the comments below!

--

--