FAILURE AND DATA SCIENCE

Published in

TotalEnergies Digital Factory

11 min readOct 4, 2022

Failure is often seen as a sensitive subject, both in personal and professional environments.

Why is that?

Let’s start with some definitions. Here are two of them: “lack of success in doing or achieving something” ; “a situation in which something does not work as it should”. Just by reading these, we can sense a negative connotation. So it is quite natural to want to avoid failure. But that is not the right way of thinking.

As Thomas Edison famously said about inventing the light bulb : “I have not failed. I have just found 10 000 ways that won’t work.” In other words; in order to innovate, we need to know what went wrong and learn from it.

I worked two years in the USA, and let me tell you, the failure mindset between France and the USA is remarkably different. I spent my first year in a medical image processing laboratory in Nashville, Tennessee. When I met my professor, he asked me to work on an image segmentation project. He told me the goal, but nothing about the approach I should use. For three months I tried different methods, with no promising results. Eventually, I shared my work with him; within one minute he killed my project, saying that the approaches I used were not adapted.

It took me at least a week to recover from that. I kept thinking that in France this would never have happened because of the constant supervision we get. What a waste of time! Except that it was not. Ever since then, each time I am stuck in my work, I do not hesitate to ask my colleagues for advice and recommendations. That changed everything for me and it still has an impact today.

Suffice to say that failure does not have to be seen as a frightful thing. It is useful; it teaches us something so that we can do better next time. Failure is not comfortable, that I can agree on, but without it we do not grow. So let’s stop fearing it and responding defensively to it. Instead, let’s learn to fail better. Moreover, in some environments, avoiding failure is nearly impossible. It is the case in Data Science, the focus of this article.

Before we jump into a storytelling about Failure and Data Science, here is another quote — we like them, don’t we?:

“Have no fear of perfection — you’ll never reach it.” — Salvador Dali

STORYTELLING

Small aside; this story was inspired by my activities at TotalEnergies Digital Factory (TDF) but it does not pinpoint any specific project.

The fall is a busy time with lots of projects to complete before the end of the year. Everything needs to be done the right way to meet the deadlines. However, it is not so challenging for the Data Studio, as it has been more than two years since the team started its activity at TotalEnergies Digital Factory, and the mode of operation is now pretty solid. The Data Studio has worked on the de-risking of many data aspects of different business projects — go check my previous article “Data coach’s journey @ TDF” for more details.

This is how, during this stressful time, the team efficiently handles the launching of a new project. When first interacting with the business stakeholders, the data scientists and the coach quickly assess the type of work needed to reach the desired goals. By chance, this project is similar to one they had previously, so they can tell right away who from the team should work on it.

All the key messages — data needs to be available, the target has to be known, the Subject Matter Expert (SME) must have dedicated time — are shared at the beginning of the project to ensure a great collaboration. The exploration will be quite short, with an expected duration of one to two months.

Rituals are put in place so that the stakeholders can follow the work advancement. A lot of data vulgarization is done to make sure that everybody understands the technical limitations and that people stay on the same page, even if the initial goal needs to be adjusted — which ends up being the case for this project.

Indeed, during the data analysis step, the data scientists quickly find out that the labels that were supposed to have been identified are missing, which makes it impossible to do predictive modeling as originally planned. Thus, it is decided to use anomaly detection instead.

At the end of the project, during the restitution meeting, the Data Studio shares its conclusions and recommendations. Unfortunately, the results are not satisfactory. Indeed, the outliers detected by the algorithm cannot be explained by the business stakeholders. Thus, the prototype developed is not useful as the results are not interpretable. The team warns the business people that Machine Learning (ML) is not adapted for this particular subject.

The stakeholders that had followed the project are not surprised and are aligned with this conclusion. They have seen the different investigations done by the Data Studio and they understand the technical limitations. However, the other people that just came to the restitution meeting are not convinced. For them, the project should not come to an end. Instead, they would like further development on the ML part.

The Data Studio explains that this “lack of a result” should not been seen as a failure, but rather as a time, energy and money saver. The exploration helped assess the potential mobilization of a delivery team to work on the subject for the following months. If a team was to take the lead on this project, they would probably come to the same conclusion, with a lot more of resources and money spent.

But of course, we cannot be 100% sure. Who knows, there might have been a ML algorithm that the Data Studio did not try and that could have been successful? After all, the exploration did not last that long. Besides, why was the other similar project successful? Well, to start with, the data was not the same and the labels were available. The Data Studio’s suggestion is to continue the exploratory work and to try different approaches such as rule-based methods instead of ML. That will give a better idea of the feasibility and the added value of such a solution. However, this need for extra time adds difficulties to meet the business deadlines. Long story short, the atmosphere at the end of the meeting was not ideal.

A couple of days later, while enjoying a delicious black coffee in the common area, a discussion started on the topic of failure. It was clear that the mindset around it was quite different between data scientists and other business domain experts. In fact, there is a reason for that. The data scientists started to give some explanation. People who work in Data Science are used to the “unknown”, to not being able to anticipate if their models will lead to satisfactory results. Failure is quite inevitable and the best way to deal with it is to overcome it, build resilience, and learn from it. Continuous adjustment and flexibility are key. Even the best-laid plans are disrupted, due to many reasons, such as an initial scope that was too ambitious. Unfortunately, this great conversation had to come to an end because of meetings starting soon.

Luckily, in response to the business domain experts’ request, a tailor-made training on the topic “Failure and Data Science” was offered in the following weeks to the employees — how nice would it be to have it for real, right? Here are some notes taken by one of the attendees:

In Data Science, lots of experimentation is done and it does not consistently lead to tangible results. Data Science focuses on exploration and discovery — such as finding “hidden” information in the data and enlightening the business by bringing new insights.
It is quite common not to be able to assess the feasibility of any data science until later on in the project. In addition, very often, the time required to deliver each exploration or analysis step is unknown.

Here is a list of some of the main reasons a Data Science project fails or is not successful:

Data is not available or is too siloed and not well organized
The data quality is poor
The data is not interpretable
There is not enough relevant data
There are no available labels or examples of events. Unsupervised ML can be an option in that case. However, this approach tends to be more challenging as it is an exploratory process that is harder to monitor, with no guarantee of results.
Lots of methods exist in Data Science, and sometimes the different approaches cannot all be tested due to limitations such as time-constrain.
Sometimes, the technical complexity of the business domain and the business skills required are underestimated. When dealing with a very specific and technical business domain, the data scientists first need to increase their knowledge on the topic. In the case of industrial projects, field trips and trainings with SMEs could be helpful.
The SME is not able to dedicate enough time to the project. This is problematic because the iterative analysis of the algorithm’s results cannot be properly done. Without the expert’s feedback and interpretation it is hard to know if the results are relevant.
There are no indications from the expert on what is the right trade-off between a false negative (when there is an anomaly but no alert) and a false positive (false alert). In some safety-related use-cases, false negatives are simply not acceptable.
The business wishes to avoid “black box” models while having a high level of accuracy, which is not always possible because of the accuracy-interpretability trade-off.
The SME prefers to improve the model and to meet the desired performances before deploying it. This approach can be more time-consuming as the improvements made theoretically cannot be tested in real-time conditions with potentially unexpected events.
There are not enough interactions between UX designers and data scientists, which means that key questions such as “how can the model be used by users?” are not answered.
Other essential questions are unanswered, such as “what does the business want to know?”, “what is the target?”, “what is the value of the project? how will it be measured?”. This lack of information makes it complicated to have a clear purpose in mind. In summary, for a Data Science project to succeed, its implementation needs to be aligned with the end-goals.
The data scientists face cultural challenges if the client’s mindset is not data-driven. The cultural difference can be an impediment to a successful adoption. In this context, it is harder to explain that sometimes, simpler solutions, such as rule-based methods, could be more successful. Vulgarization efforts are crucial.

The list above is only a small fraction of the obstacles the data scientists can face when developing a project. This explains their resilience to failure.

As mentioned, failure can be beneficial and here are some of the benefits:

It forces us to pause, to reflect, to implement new approaches, strategies and ideas. It leads to innovation and creativity. Failure-analysis helps us get new insights and information that are useful in our learning journey.
Failure teaches us flexibility, resiliency, but also how to overcome obstacles and to use change to our advantage. It keeps us nimble and helps us adopt a growth mindset. It can be seen as a stepping stone, the best failures being the best learners.
When we learn to fail and to get comfortable with making mistakes, we can better adapt to fast-changing systems. Additionally, it motivates us to take more risks instead of being afraid, cautious, safe or rigid. It is about celebrating the effort rather than the result.
“Failure is a delay, not a defeat. It is a temporary detour, not a dead end. Failure is something we can avoid by doing nothing.” Denis Waitley

After the training, the business domain experts understood that experimentation and exploration is a huge part of Data Science. They gained awareness about the high degree of unknown there is in this domain and that failure is often inevitable.

On the other side, thanks to the discussions, the data scientists better perceived the differences in terms of failure-mindset. They came to the conclusion that they had to do more work on expectations management, by being more upfront with the end-users and by explaining the technical limitations right from the beginning. For that, they need to find the right level of explanation depending on the client’s profile. Challenge accepted.

On top of helping create a greater mutual awareness, the discussions and training initiated other conversations and workshops aiming to identify areas for improvements. New ways of working are still yet to be defined, but one thing is sure, the collaborations between data scientists, product managers, SMEs and others need to be strengthened to co-construct and work as a united team.

END OF THE STORYTELLING

I will conclude with a quote — the last one, I promise:

“It is hard to fail, but it is worse never to have tried to succeed.” — Theodore Roosevelt

Text and illustrations by Gaïlé Lejay

FAILURE AND DATA SCIENCE

Written by Gaïlé Lejay