Resilience is key in data science: supporting SaMi for Brazilian’s health system

Letícia Ange Pozza
oddstudio
Published in
6 min readJul 29, 2020

This is a summary of our learning from one of many projects we support at the Bill & Melinda Gates Foundation for the GCE in data science and health, and a series talking about these project’s products and results. We will be posting about the others as their results are published. We hope you enjoy!

About the Grand Challenges Explorations and our work supporting projects from the Gates Foundation

For the past year Odd.Studio has worked with more than 14 projects in data science with brilliant researches in public health, computer science, epidemiology and many other related areas. Our company is part of a bigger strategy group that supports these diverse research teams that have received USD 100,000 grant from the Bill & Melinda Gates Foundation (BMGF) and from the Ministry of Health in Brazil, to look at ways to improve maternal and child health in Brazil, leveraging and exploring available data. Brazil has huge administrative databases in health, one of the results from having a unified health system in the country.

The Grand Challenges Explorations in Data Science is a program from the BMGF that started in 2018 and is working to build a community in data science for health. We have 14 projects in its final stages in Brazil, selected from the first round, and 20 others starting in India and Africa. Soon, we will be an ecosystem of ~250 researchers connected worldwide, with incredible knowledge in data sources from all these locations, data science analysis techniques and machine learning models, treatment and analysis of huge volumes of data, and many other related topics.

All of it was built from a local perspective, with stakeholders involved from the beginning, helping define the topics that were eligible for propositions, connecting to local data partners and sources. Thus, creating the potential to impact the lives of millions of women, mothers and children in these locations, as well as to support decision-makers and guide local stakeholders to learn and use this deliverables and results, shaping public policy.

SaMI’s Journey: Interacting with Stakeholders to build a Data-Driven Platform to Support Decision-Making

One of the selected projects is from Federal University of São Paulo in collaboration with Unicamp, where a team of researchers has built a platform (SaMI) to support decision-making, that among other visualisation tools and models, could predict the probability of survival of a newborn on its first 28 days of life, based on a list of atributes from the baby, the mother, and its location. To build the 94% accuracy model, they used more than 40 million data samples from 2006 to 2016. After running the prediction, these variables are ordered to indicate which of them had the biggest impact on its outcome. You can run the model here (only in Portuguese), but they have a publication about it here in English.

Here is an example from one of the graphs explaining the model, indicating what had the biggest impact on the risk of neonatal death of the inputs I have given the model. The red indicates that the conditions selected in which the newborn would be born were not ideal, and that it has 94% probability of not surviving in its 28 first days, being the Apgar and the Robson Classification the main reasons for this outcome.

But even though this is an amazing tool, when presented to our local stakeholders — technicians that were going to consume this data to plan strategies to reduce neonatal mortality in Brazil — , the information was aggregated in a way that made it difficult for these group of people to carry it forward or to even analyse it. They were not ready to use that type of information, at least not at that moment.

It is actually pretty common for data science teams that have been working months on something not to realize that they are a lot of steps ahead of the user’s ability to consume it. There are a number of reasons for it to happen: not enough time to touch base, the request was not specific, the data doesn’t answer that specific request…

And this is where the beauty of the GCE program comes in: when discussing the possibilities we understood that the model they needed was much simpler, in a way, but just as important. When digging into their necessities on predicting neonatal mortality in Brazil one of the experts pointed out that predicting itself was not the problem, but looking at it in real-time, was. Apparently, there is a 4-month-gap from neonatal deaths occurring in many areas in Brazil and the data they look at. So what they thought was a real-time problem, we understood as a predictive opportunity.

Why not use data science to predict where and when neonatal mortality rates would rise, trying to close the gap between the information they had access to and what would happen in the near future? With that information, they could plan 1, 2, 3 or 4 months ahead understanding where they should focus on. Something only possible when interacting with stakeholders and consumers of this type of information.

Other research groups had already identified that there is a seasonal component to neonatal mortality in Brazil (in some regions more than others). So when the opportunity came, we were pretty confident that by using temporal series analysis they would be able to predict neonatal mortality with an acceptable accuracy, specially for the intended purpose. The continued lines are the predicted rates for months ahead (red for 1 month, purple for 2, blue for 3 and yellow for 4) in a specific region in Brazil — Aracaju. The green, dotted line represents the real data. We’ll write more about this seasonal component when the results are officially published.

And yes, we know that we are privileged to have access to the final user because of the program. Some researchers don’t even know where to start when we talk about accessing and considering their final user in their process, and others are more interested into writing their papers. This is one of the reasons we are supporting this community: to help translate their work into deliverables that are easier to be understood by local stakeholders. So making sure we create this conversations and help translate to both parties is part of our role with the Foundation.

But how many ARE close to the final user and don’t even bother to include them in the development process? This leads to lots of projects being completely put aside, due to not commiting to small changes on data science project/product that could have real impact on a country’s population. It is not only about asking the right questions, it is about actually listening to the answers.

And for that, I completely applaud SaMI’s team and their spirit: they had the opportunity to make something useful and they took it. Even if it meant reevaluating the efforts and resources from their original plan, its importance to the greater public was put first. What they had done so far laid ground for them to see a way of using data for good, and they had the knowledge to act fast on it, changing mid project without any doubt.

Being resilient when interacting and listening to your end user: a key competence for data scientists

I’ve seen many data scientists wishing to be seen and to be heard, and to develop THE project that will change their careers, but as long as it means applying the craziest techniques and doing it on their terms.

I see a smaller group that has understood that data science is a support area, and that if you are not a good listener and a team player your project will probably remain in a drawer. SaMi’s team was opened from the beginning and supporting their final products has been a huge pleasure and learning opportunity for us. They’ve reminded us that:

  • Data science is about understanding your role and playing your strengths, but being humble to accept (and ask for) help in areas that you don’t have a lot of experience;
  • Science can be applied to policy making when both parties are ready to contribute and listen, but sometimes, being present requires letting go of personal beliefs and objectives to support a common, greater, goal;
  • The importance of a “translator” that helps with a systemic view can make things quicker and leverage opportunities that specialists tend not to see due to being deeply involved into their project’s details;
  • Visualising and showing your results in a compelling and simple way facilitates conversation between diverse areas or knowledge.

Our special thanks to Tiago Carvalho and their whole team for putting up with all our requests and nagging over these 18 months. We hope to see more of this project’s amazing results in the future, and you can always count on us to make it happen.

--

--