In need of a human-centric data revolution

Published in

Dataveyes Stories

9 min readAug 23, 2019

This cover represents the most frequently used terms in this article, their relationships and their musicality.

(A French version of this article is available here)

At the beginning of May in London, we took part in the Strata Data Conference, a well known international event about data. An opportunity for us to stand out in an increasingly complex and congested ecosystem, to deep dive into the current trends, and refine our understanding of the issues many actors face.

Be data-driven, or die

Since its debut in 2011, the Strata Data Conference has been the place to discuss topics at the forefront of the data-driven world. Data-driven, or AI-first as its more fashionable known, is a term encapsulating all the concerns of big companies in recent years. According to the recruitment platform Indeed, the demand for the data scientist profile alone has increased by 344% since 2013, with growth of + 29% each year. Is being data-driven the major difference between innovative new cool kids and older more established companies? Under the fear of letting the innovation train leave them behind, many companies decided to create a data lab (or data foundry, or data factory).

Are data labs failing?

What is a data lab if not humans, who master data science, and data? The word Lab here emphasises the experimental nature of the projects that might be implemented. On paper, the data lab seems to be a good idea: a team within an organisation dedicated for making the most out of data and valuing this essential business asset of a 21st century company. But between structures often far from business operations, and complex topics like quality, accessibility and governance of data, the model seems to struggle to create value.

During Strata, I discovered this study by NewVantage Partners that asked top managers and C-level about the transformation of their companies. The study shows a strong awareness of the importance of mastering data: in 2018, 97.2% of the leaders who responded said that their companies invest in Big Data and AI initiatives, in order to become data-driven companies. However, the 2019 study reflects an impression of failure: the percentage of tops managers identifying their companies as data-driven has decreased over the last 3 years (37.1% in 2017, 32.4% in 2018, 31% this year). This perception dissonates with their investments in data projects: according to the 2019 survey, 92% of respondents are accelerating their investments in Big Data and AI.

I conclude that despite the vital and well known importance of becoming data-driven, just a few believe their company is doing so. How can we explain this impression, despite a growing financial and organisational effort?

Myth #1: data labs are all about data scientists

The first problem companies faced in creating a data lab is the difficulty recruiting candidates with technical skills in a new and extremely volatile environment.

Companies first hired data scientists, and only data scientists. They have been seen as a miracle solution, profiles at ease with all the components of data science: from the finest mathematical knowledge to the most complex database computing, to an ability to conceptualise business and industrial processes. These people are computer scientists, statisticians and business consultants rolled in one, the unicorns: we all dream of them, but they do not exist.

With this fantasy in mind, the data-related positions are often defined in a vague way, even though data jobs are constantly becoming more and more specialised. For example: you can not be a data scientist expert in both computer vision and natural language processing. All the more since the reality of an everyday data worker’s routine is less about machine learning and neural networks but rather the cleaning, structuring, and moving of data.

Not to mention: data scientists alone can’t make good data products. For example, technical infrastructure can quickly become a problem and in addition to data scientists, it becomes necessary to recruit data engineers. Then, other problems take place: the lack of skills dedicated to the actual crafting of software or interfaces. It’s endless! Each time, all the initiatives start from scratch, time and budgets are running out and convincing results do not arrive.

In data labs, the competencies brought in-house tend to not match the needs to identify, implement and disseminate use cases that bring value to the company: the roles are too compartmentalised, too far from the business, with a lack of global vision. The challenge has been viewed the wrong way around, rather than focusing ourselves on what brings value to the business.

Let’s do something with data and hire data scientists.

To innovate with data, a data lab should integrate other professionals rather than only the data scientist, people capable of building strategic software and integrating them into an organisation. That is, consultants, analysts, product managers, data engineers, developers, designers, etc. This diverse team breaks the isolation of the data lab, creating value for an organisation.

Myth #2: value is in the pipes

The second problem has been to take refuge in technological solutions, focusing on the tools and their technical capabilities, rather than trying to demonstrate the value that can be created from data.

Thus, many companies have focused on the data pipeline, the “data plumbing”, in order to be able to respond effectively to the operational challenges that arise from the daily work with data. They have invested heavily in technology platforms: cloud architecture, data science platforms, business intelligence tools, etc. The landscape has become more and more complex in recent years.

*Big Data and AI Landscape 2018 by Matt Turck, Demi Obayomi and FirstMark*

Who has not heard of a data lake project? Centralising all the data, structured and unstructured, is indeed a nice promise. But by not asking the questions about the usage, the usefulness of the data and the value that we will draw from it, this promise becomes the first reason for failure. Implementing these holistic technical architectures is time consuming and expensive. There is a high risk of ending up with an infrastructure that does not respond effectively to the most impactful use cases. Even though some of the more mature companies have overcome the challenge of streamlining their data processing pipelines by investing heavily in data engineering. This has required considerable resources… that not everyone can afford.

https://timoelliott.com/blog/cartoons/more-analytics-cartoons

It is much easier and less risky to develop some case studies at first, to demonstrate their impact, and then work on their scalability in terms of technical architecture. Of course setting up a data lake can be useful, but time is an essential element of a well-managed strategy: in the first place it seems more judicious to create value with the most impactful projects, with measured impact on the business and the organisation.

The main challenge of the data lab is a cultural one

The latest version of the NewVantage Partners study shows that only 7.5% of data managers cite technology as an issue to accelerate the transformation of their company (11 points less than in 2018), compared to 62.5% who quote the human aspect (14 points more than in 2018). In most of the data labs I have observed, it is obvious that the human dimension of a data-driven enterprise is often put aside or treated at the end of the chain. The impact on work, processes, the organisation itself, and how we bring the value created have been forgotten.

I have already written in a previous article about the place of humans in machine learning software and its impact on work. I did explain how the tech “champion” companies are addressing in a very serious manner the human aspect of creating data value, and the related strategies they are currently implementing. As an example, I invite you to take a look at the dedicated teams of Google and Uber on this topic:

— the PAIR team within the Google Brain department, and their Big Picture group, at the forefront of research about data visualisation in machine learning;

— the visualisation department of Uber, that designs and builds interfaces for the whole company, and beyond.

These companies have realised that investing in data science skills and tools is not enough: it’s not just about training people to data science, it’s about bringing the use of data to people.

A data-driven business is using data outside of the data lab

Uber internally develops many custom-built, data-rich interfaces designed to serve business experts who are not data-savvy. In doing so, Uber turns each of its employees into a data scientist. For example, City Operations Managers, a key position at Uber dedicated to the operations in each city, have access to software enabling data exploration in a simple and intuitive way. To achieve this, Uber focuses on people who can translate data into business needs and usage, that is, translate them into interfaces that employees can explore, analyse, understand, and then make better operational decisions. These interfaces allow the data to have a business impact. Uber’s competitive advantage lies in its interfaces, and therefore its consultants, product managers, designers and developers, working hand in hand with its data scientists and data engineers. They make possible a virtuous circle of information management. The information available internally becomes richer, shared, smarter. Thus interfaces make use of human understanding and analysis capabilities coupled with the machine’s ability to calculate, sort and rank.

Example with ODsee, designed and developed by Dataveyes for the Autonomous Operator of Parisian Transports (RATP): by transforming bulky and complex OD data (Origin-Destination) into strategic information for transport operators, the tool allows a fine management of the mobility demand. **The interface became as important as the database**.

Data labs must become human-data labs

The data is not the situation, just as “the map is not the territory”: it’s a representation, and therefore describes a situation more or less accurately. This is key to understand. The data is there to inspire the creativity of people and stimulate their analysis. Whether the core of a data-rich software is descriptive, predictive, or even prescriptive, it is the interface that makes data work for the business. Interfaces allow the creation and ongoing learning by giving people the capability to identify, interpret and communicate the information pieces of the puzzle and thus develop useful business knowledge. Once analysed, understood by most of experts, that data will bring interesting perspectives. Interfaces have the ability to speed innovation in a company with data, and ultimately leverage it. Human-data interactions must be designed with business processes and decisions in mind, where their usage will be the most impactful.

This work can no longer be underinvested in data projects. On the contrary, it must be the first priority, the keystone, that ensures a project lasts.

*Illustration by Evelyn Münster on twitter*

At Dataveyes, we put humans at the core of our approach, because we believe that data gains value when it reaches the hands and minds of users, rather than staying in data centres. Inspired by service design, our approach is user centric, and takes into account the maturity of the company’s internal culture. Our methodology leads us to analyse its company’s business processes, and structure its information landscape too.

Making data work for people is key towards an actual transformation of the company. Data labs seem to have put aside this aspect. Nothing is lost, it is about time to change the approach, leverage the bullet time effect of interfaces on their data, and transform data labs into human-data labs.