Combating Malaria with AI and Satellite Imagery

How to adapt quickly as a team in a real-world project to maximize value generation.

Published in

Omdena

8 min readOct 28, 2020

One essential aspect we learned while working on this project was fluidity, or the ability to reflect broadly about the challenges, react to changes, and pivot in accordance.

Omdena challenges are great opportunities for both kick-starting efforts to generate positive social impact and grow professionally.

A non-zero-sum game where, the better you and your team become, the greater the positive impact you can generate in society.

The primary reason is that projects start from the business context and not from the data science context, as other platforms such as Kaggle fall in. This means that rather than starting with clean data and an already defined metric to optimize, Omdena projects are fuzzier. One must figure out first how to set the problem in a data science framework, consider the availability and quality of data, and what metric would be appropriate to measure the model’s impact. In consequence, projects feel more real, and delivering one requires more than ‘hard’ technical data science skills. Here I want to talk about one I deem essential: fluidity, or the ability to reflect broadly about the challenges, react to changes, and pivot in accordance.

The context: AI to combat malaria

First, however, let me introduce to you the project we developed, and mention some key aspects that made this project — and Omdena projects in general — challenging. We worked with Zzapp, a company that spearheads an innovative solution to combat malaria. Historically, one of the common methods to control malaria is to control the mosquito populations that are a vector for the infections. Now, when considering mosquitoes, if they are in their adult, flying forms, getting to them is an arduous endeavor. However, mosquitoes spend a good portion of their lives in larvae form, and they require stagnant water bodies to grow. Thus it is easier to look for stagnant water bodies and control the larvae populations. This is an approach that has worked before in eradication efforts. For this approach to work, it is paramount to account for most, if not all, water bodies. This is the reason you’d see campaigns to avoid leaving pots with water in the open since it might be a breeding ground for mosquitoes.

Zzapp modernizes this approach with technology. Field workers use an app to mark water bodies and track field workers who would survey an area and later testing to estimate the need for fumigating the water bodies. One of the major tasks Zzapp has to figure out is how to be efficient at surveying new; can they tell where water might be present before sending the field workers? Here is where we entered the action, and as you can see, this was quite an interesting problem we tackled. For a wonderful explanation of how we approached the technical DS aspects, I suggest you check this article by my teammates Frank, Lauren, and Tanmay.

A fluid mindset

Given this context, I would like to talk with you about challenges that are not necessarily technical in their core, or that perhaps required approaches going beyond what one might call technical or hard skills. And also why I felt to solve them we required a fluid mindset.

Our first significant challenge was the setting of the project itself. We encountered a nonstandard undertaking since there was no straightforward way to transform the business statement into a data science framework. In the beginning, we were only aware we should explore models to detect water using satellite images. In accordance, over the first stages of the project, we expected the images would let us use common deep learning techniques with some fine-tuning to detect water. Yet initial exploration of the images from multiple sources revealed an issue with this initial belief.

Sample images that ‘allegedly’ contain water, from two different data sources

Can you detect water here?

Then, by reviewing the literature on the topic, we confirmed that we were dealing with an open, broad problem. For more traditional settings, one would start looking for alternatives and deciding on one that to build upon and reach the solution. In theory, if you put enough effort into the modeling, you could improve the model little by little. In our case, since the problem statement was so broad, there could be a tremendous risk in investing all our efforts in a single approach. We did not even know if the data was enough to produce some results. Thus, there was this constant looming of uncertainty with any approach, since there was no guarantee that any solution we tried would work despite our efforts.

Expectation vs. reality whilst developing a single approach to the problem

Exploring the space of possibilities

In these situations, we could learn from the philosophy of Bruce Lee:

“Empty your mind, be formless, shapeless, like water. If you put water into a cup, it becomes the cup. You put water into a bottle and it becomes the bottle. You put it in a teapot it becomes the teapot. Now, water can flow or it can crash. Be water, my friend.”

For me, this speaks of both adaptability and being capable to tackle a problem from multiple perspectives. When we thought like this, we exploited the open nature of the problem to explore and experiment with innovative and disparate ideas. We then played with different data sources, different ways to frame the data science problem (supervised/unsupervised), and different machine learning models. Granted, attempting such a thing by oneself would be ludicrous, yet such is the beauty of having an entire team willing to try fresh stuff. So this mindset requires both a personal and collective compromise with the cause.

What a diverse team can accomplish by exploring multiple approaches

While pursuing parallel approaches collectively, it’s also key that the way we test the approaches remained consistent for everybody. We had many discussions only to define how the data formats could be consistent and how to have all models have uniform inputs and outputs. One thing that helped us was talking with Zzapp, since the way they operationally divide the regions gave us an idea to do the same for data. Then, our models would work in the same patches of land. Despite simple, this speaks of the importance of having clear communication while facing these projects.

Constrains and communication

Even for a big team, there is still a limited capacity to explore all approaches. This means that there should be a careful balance on how much we want to spend resources on each alternative. This is even more pressing in Omdena projects since you have only a little over two months to tackle the challenge, and whatever you could make is what you deliver. To handle this limitation, we learned to value constant communication with the stakeholders (for us Zzapp) as with the rest of the collaborators’ team. During the project we presented our progress to Zzapp weekly and discussed with them what they would think might work or not. This way we were also benefiting from their expertise in the field and from their previous modeling approaches to detect water. We also held another meeting internally on the same day just to talk and align. The meeting was great to check what alternatives were working, and it’s one thing that allowed us to plan and track the myriad tracks we were developing.

This — thinking broad and relying on your team to explore the possibilities — mindset also allows for some nice synergies to form. One time I reached a plateau developing auto-encoder models² to create lower-dimensional representations of the images such that we could use these representations instead of the images for model training. However, the initial results proved unsatisfactory. We contemplated scraping the idea when another teammate showed some results from his work. He used the spectral signature of pixels to discover differences in the composition of terrain through a technique called spectral unmixing¹. Then it clicked for me. Instead of trying to reconstruct the image, we could try to reconstruct the histograms of the images. Turns out the method to reconstruct images only needed some minor changes to make histogram reconstruction work. We later used these representations in our models. From the experience, it’s clear to me I would not have thought of this new angle if not for what my teammates were exploring.

Overcoming internal conflict

However, the greatest challenge in getting used to this fluidity mindset was learning to let go. No one wants to let their effort to waste. We recognize this behavior as the ‘sunken cost fallacy’; people see more value in things they already invested in, and the friction to change becomes greater the more has been ‘sunk’. Going back to our project, this occurred with the work of an entire team. As I mentioned, there are some standard parts that most machine learning projects follow. One of these is annotating data to train models. We set up a team at the beginning of the project for that, and in accordance, they invested heavily in learning how to do their part efficiently. Meanwhile, other teams noticed that the manual annotations would not be helpful given the approaches we had in mind, which meant that we should probably dial down our annotation efforts. Still, some teammates wanted to see what they could do with annotations and carried these efforts until the latest stages of the project. While I believe that for my teammates the learning experiences were valuable within the project’s context, the effort could have been welcome in other approaches.

There was another similar experience that completely changed the project’s landscape. After one of the weekly progress meetings, Zzapp confirmed that one part of the project was not a pressing matter to them yet. The portion in question was some additional effort to quantify the risk of actual malaria presence, if we have a water body, would it have mosquitoes carrying malaria?. Having some of us working on this addendum made sense at the beginning of the project, and we also dedicated a team of people to explore this. Here, we could move because we aimed for clear communication; the conversation with Zzapp gave us a tremendous incentive to stop working on malaria detection. However, this time, we pivoted faster and closed that research to focus on our main deliverable. We learned how to pivot faster by our experience during the project.

Closing words

I’m certain these learning experiences transcend this project. Because openness, uncertainty, and unpredictability will be ubiquitous in real-world endeavors. The key is learning to be fluid. It will work as long as you keep an open mind to tackle the challenges and focus on communication with your team and your stakeholders.

Finally, I would like to highlight these articles by my teammates which also revolve around the ZzappMalaria AI project:

Tanmay’s article on data sourcing and our technical approach
Rasha’s article on her personal journey through the challenge

Also here’s the link to the project statement on Omdena’s website.

References

[1] Hardy, A., Oakes, G., & Ettritch, G. (2020). Tropical Wetland (TropWet) Mapping Tool: The Automatic Detection of Open and Vegetated Waterbodies in Google Earth Engine for Tropical Wetlands. Remote Sensing, 12(7), 1182. doi:10.3390/rs12071182

[2] Kingma, D., & Welling, M. (2019, December 11). An Introduction to Variational Autoencoders. Retrieved October 20, 2020, from https://arxiv.org/abs/1906.02691