An introduction to Human Centred Data Science and what it means in practice (pt. 1)

Michelle Lee

Published in

Synthesis Partners

5 min readSep 17, 2020

Written by Michelle Lee and Xiao Shuang Na

Click here for part 2 of our Introduction to Human Centred Data Science series

Here at Synthesis, we call our approach to understanding data Human Centred Data Science.

Human Centred Data Science refers to understanding data by understanding the humans and the context shaping it. Recognizing the human context lets us better understand which data to look at, and how to interpret it.

But what exactly does Human Centred Data Science mean from a technical perspective? How does it differ from regular data science? What are the steps and processes we use to achieve it?

In the first part of this introduction, I’ll be giving some examples of why and when to use Human Centred Data Science.

Embracing imperfect data

Open data is an incredibly powerful resource. The huge amount of detailed, freely accessible data on online behaviour gives us a unique window into people’s thoughts, feelings, and actions.

However, much of this data is imperfect. Many forget that data ≠ numbers, but people. Data reflects humans: fractured, messy, evolving. It’s often full of bias — a single data source may reflect just one aspect of behaviour, or only one group.

What you need is better data, not bigger. We need data that is more relevant and accurate, rather than just more data. Humanizing data — understanding the people and setting behind it — helps you select the most relevant data set.

We can break down our considerations into three types: human drivers, data context, and giving meaning to numbers.

Human drivers: How do cultural norms and flux influence the way people behave online? Platform mechanics, incentives, and biases?

Different platforms have different mechanics and incentives, shaping how we interpret engagement statistics. Photo by Georgia de Lotz on Unsplash.

Example: Instagram vs. Naver likes

Liking a post on Instagram is as simple as double-tapping. But liking a post on Naver blog, Korea’s biggest blogging platform, requires you to open the post and scroll all the way to the bottom.

Platform mechanics and incentives mean that a like on Naver blog is not equivalent to a like on Instagram. Naver blog likes are more indicative of engagement, shaping our interpretation of like count value.

Example: Japanese social media users

In Japan, people have one social media account for normal life and connecting with friends, and separate accounts where they post solely about their hobbies and interests.

This gives the appearance that there are no casual hobbyists in Japan, only hardcore ones whose lives revolve around their hobbies. Understanding the cultural context helps us realize this is not true, and consider other ways to find the bigger picture.

Data context: What audience is the data created for? How is it shaped?

Example: On movie rating site Rotten Tomatoes, movies with high critic scores often receive low audience scores. The Last Jedi, for instance, has a 48 point disparity between audience and critic ratings.

That’s because critics and audiences use different evaluation criteria. Critics look at movies from an aesthetic perspective, while audiences care more about enjoyment. The score you include in your dataset depends on the question you want to answer.

Give meaning to numbers: What do the numbers mean in relation to the problem and the audience? How should we understand scale?

Visualization of Tripadvisor local versus tourist reviews in the USA, done by The Pudding

Example: TripAdvisor locals vs. tourists

Over 60% of TripAdvisor reviews are written by tourists. Chains like Starbucks and major attractions like Times Square thus have high ratings, despite locals considering them tourist traps.

High ratings don’t mean anything if you’re trying to explore the local perspective. Filtering reviews based on where reviewers are from can help us do that, as explored in this data visualization by The Pudding.

Example: TMall 4.9 scores

Many products on TMall, China’s mammoth e-commerce platform, have customer ratings of 4.9. This is because TMall automatically assigns a 5 stars rating for non-rated transactions, making it a norm to receive a 5 stars review.

Product rating scores thus give us very little information about product quality. To measure this, we have to find other approaches and metrics.

Answering complex questions

The kind of questions we answer for our partners are broad and complex.

They might involve anticipating consumers’ needs and wants based on shifts in online conversation, identifying shifting perceptions surrounding an idea through patterns of search data, or defining emerging subcultures through behavioural data.

These questions can be approached from multiple perspectives, and have no clearly defined answers. They can’t be separated from their social dimension — and the world is interconnected and complex.

Layering is a key principle of Human Centred Data Science that helps tackle complex questions.

Weaving data from a variety of sources is a way to embrace imperfect data and gain a holistic understanding of a topic. Each source can provide a different perspective or angle on the issue at hand.

From data to actionable insights

People don’t intuitively relate to numbers.

Instead, they relate to stories. We instinctively gravitate towards narratives about what people do, and why they do it.

Human Centred Data Science helps us present data in a way that is meaningful, relatable, and actionable.

Linking the data to the humans behind it helps you connect with your audience. It lets them contextualize the data in real world terms, and understand why it’s relevant to them.

This allows us and our partners to generate strategies for impactful action. Recognizing the human drivers behind data lets us come up with actionable insights that are relevant to people’s needs and desires.

Conclusion

Human Centred Data Science works best in solving any problem that’s really complex, whether in terms of human behavior, multiple data points, or the need to innovate.

It can help you embrace imperfect data and identify meaningful plans of action.

We’re still working on bringing the Human Centred Data Science approach to every human-data problem we see; every platform, every project, every person teaches us something new. We’d love to hear your builds — let us know in the comments below!