IBM now helps stop Malaria spread with AI

AI models predict water body formation and mosquito growth

Oscar D. Lara Yejas
IBM Data Science in Practice
8 min readApr 22, 2021

--

By: Óscar Lara-Yejas, Shirley Han, Keely Wright, and Arnon Houri-Yafin

A member of the community reporting water bodies in African cities using the Zzapp Malaria mobile app
Zzapp Malaria is partnering with IBM’s AI to fight the disease

Malaria intervention is shown to be the most effective way to save a human life. In 2019 alone, the disease caused over 400K deaths, more than by COVID-19 in the US in 2020. Nearly 90% of those deaths occurred in sub-Saharan Africa and sadly, most of them were children under the age of 5.

Zzzap Malaria is partnering with IBM to prevent the proliferation of mosquitoes transmitting malaria. Using AI, we are building tools to predict the formation of water bodies before they can become mosquito habitats. Health agencies can then use this information to optimize their efforts for larviciding: an ecologically-safe method to interrupt the development of larvae or pupa into adult mosquitoes.

Follow the mosquitoes… I mean, the water bodies

The deadliest animals on earth are neither sharks nor snakes but rather mosquitoes, with a terrifying record of over a million yearly human deaths worldwide. Out of over a dozen mosquito-borne illnesses, malaria is the deadliest.

Mosquitoes grow in smaller water bodies closer to villages and cities. Many of these water bodies appear and disappear over time following rainy/dry periods. Our approach is to analyze weather data and satellite images to predict the formation and evaporation of water bodies.

Zzapp Malaria

The Zzapp Malaria mobile and desktop app allowing to report and visualize water bodies in African cities and villages
The Zzapp Malaria mobile app and dashboard

Zzapp Malaria is a startup company with the ambitious goal of eradicating malaria. They developed an AI-powered mobile application that allows for personalizing malaria elimination interventions in a number of African towns and villages.

As a result of their success, Zzapp Malaria was recently awarded as one of the three IBM Watson AI XPRIZE finalists. The $5M AI XPRIZE competition challenges teams across the world to demonstrate how AI can tackle global issues.

The data

Two data sources have been collected for this study, including (1) weather data from the IBM Weather Operations Center, as well as (2) satellite water body data from the Global Surface Water Explorer.

We’ve focused on three African cities where Zzapp is already operating: Zanzibar, Finote Selam, and Maputo. The three of them have different characteristics as Zanzibar is an island, Maputo is a coastal city, while Finote Selam is situated inland. Over 16 years of data in an area of more than 30,000 km² were analyzed in this study.

Screenshots of maps of three African cities being focus of the water body analysis for malaria intervention
These three cities were the focus of our water body formation models.

Weather data

The IBM Weather Operations Center provides over six petabytes of geospatial-temporal datasets worldwide, including satellite imagery and weather data, along with user-friendly query and aggregation tools to help businesses derive useful insights.

We collected hourly weather data from 2004 to 2019 within the target locations (i.e., Zanzibar, Maputo, and Finote Selam), including temperature, precipitation, humidity, dew point, wind direction and speed, among others.

screenshot of available Geospatial Analytics. Different datasets available include weather, wildfire, agriculture, etc.
Geospatial Analytics. Different datasets available include weather, wildfire, agriculture, etc.
screenshot of IBM Climate Data Explorer showing an image the African continent and an image of the United States with climate hotspots
IBM Climate Data Explorer

Satellite images (water bodies)

The Global Surface Water Explorer (GSWE) is the result of a partnership among The European Commission’s Joint Research Centre, the UN Environment Program, and Google. It maps the location and temporal distribution of water bodies over the world for the past 35+ years, including statistics on their size and change. As the GSWE relies on satellite data, it only provides monthly snapshots of the water body distribution, which introduces limitations in the analysis.

But not all water bodies are relevant in the context of malaria intervention. In fact, mosquitoes carrying malaria only live in smaller water bodies, so a pre-processing step is to calculate water body sizes for the target regions.

Animations  of changes in water bodies over time in three African cities. These water bodies could potentially grow mosquitoes carrying Malaria
Water bodies over time in three African cities

The tools

Watson Studio offers end-to-end capabilities for every step of the model building process: data collection, curation, and exploration, as well as modeling and deployment. The platform includes visual tools such as AutoAI, which allow for building models in a matter of a few clicks, as well as the latest open source packages for the code-savvy data scientist. This article gives you an overview on why enterprises invest in data platforms in the open source era.

Watson Studio really accelerated every step in our AI project. From data preparation and validation to analysis and visualization. We were able to build models with an impressive performance in a very short time.

Arnon Houri Yafin, CEO, Zzapp Malaria

Visualizations built in Watson Studio, including correlation among features, as well as weather data over space
Visualizations built in Watson Studio. On the left is the correlation among aggregated weather features. On the right is the location of water bodies (blue/purple) on top of the heatmap of precipitation (green/red circles)

Counting water bodies

Water body data are retrieved in the form of TIFF images. Each pixel of the original image corresponds to an area of 28 m × 28 m and it represents whether that particular location has water or not (left image below). Missing data and land are considered no water (center image below) Now, the goal is to find clusters of adjacent water pixels which are the water bodies. This problem can be interpreted as a version of the counting islands problem, and can be solved by using the Depth First Search (DFS) algorithm.

screenshot of three images. The one on the left is the original image that shows missing data, land, and water. The center image is the “water body” image which shows only “no water” and “water”. The image on the right is the relevant water body count image, which shows “no water”, “few water bodies” and “Many water bodies”
Relevant water body counts. Yellow regions on the right have the largest amount of relevant water bodies (i.e., water bodies smaller than 5 pixels)

Original images are 4000 × 4000 pixels, which is 111 km × 111 km, or × , and these are centered on their respective cities. For each image, water bodies are counted in 10km ×10km regions as in the right image above.

Average monthly number of water bodies for Maputo, Mozambique: some circles show higher average numbers than others based on what the algorithm determined
Average monthly number of water bodies for Maputo, Mozambique. Green/blue colors represents higher number of relevant water bodies. Yellow/red colors represent lower number of relevant water bodies.

The figure below shows that there’s a correlation between precipitation (represented by the black line) and water bodies (represented by the red line). Our hypothesis is that precipitation, along with other weather variables, can be useful to predict the number of water bodies.

graph showing Zanzibar’s average monthly precipitation versus number of water bodies from 2016 to 2019. The two lines appear to correlate strongly with one another, with the maxima and minima occurring at the same points in time.
Zanzibar’s average monthly precipitation (black) vs. number of water bodies (2016–2019)

Predicting water bodies

Once we understood where water bodies have been, the next goal is to predict them in a given month and region. This is a spatiotemporal regression problem where features are from the weather data (measured hourly) and labels are the water body counts (measured monthly). To align these two data sources in time and space, weather data were aggregated in 10km × 10km regions on a monthly basis, computing aggregated features such as mean, standard deviation, median, min, and max.

Different regression algorithms were evaluated, including Linear Regression, Lasso, XGBoost Regressor, Huber Regressor, among others. Each model was also tested with and without Principal Component Analysis. Metrics like mean cross-validation score, K-fold CV average score, MSE, RMSE, and R² were evaluated to determine the best model for each city that we are observing. Our results indicate that Huber Regressor performed best for Maputo and Finote Selam, whereas Lasso was better for Zanzibar.

a line chart showing the results of the HuberRegression algorithm for Maputo in 2019 for number of water bodies present over the year versus what the actual data showed. The lines appear to correlate closely, with local maxima and minima corresponding in time.
Predicted vs. actual number of water bodies for Maputo in 2019

The figure above shows predicted versus actual number of water bodies in Maputo. The fit was very close for the second half of 2019. We encountered quite a few issues with the data granularity and integrity that affected the quality of the model for the first half of 2019, yet the water body count trend is correct in 83% of the cases.

Although this study was carried out with coarse open-source water body data, Zzapp is making great progress in collecting very granular spatial and temporal water body data from users’ reports and drone imagery. We strongly believe that replicating the same techniques we have covered on the more granular dataset will likely lead to even better results.

screenshot of a list of deployed models predicted water bodies in Watson Studio
Deployed models predicting water bodies in Watson Studio

The top models for each city were deployed through Watson Studio, which provides an interface to put models in production as HTTP services. The models are accessible through APIs and the corresponding code is autogenerated in cURL, Java, JavaScript, Python, and Scala for being used from web or mobile clients.

How can I put AI to work?

“Working with the DSE team at IBM was a great opportunity... I learned a lot from their professionalism and attention to detail, and appreciated their ability to mix flexibility, curiosity, and real rigour in solving the challenge.”

Arnon Houri Yafin, CEO, Zzapp Malaria

IBM’s Data Science Elite team has been successfully helping enterprises adopt AI on their top use cases, in multiple industries such as healthcare, finance, retail, mining, manufacturing, among many others. Our remote data science engagement model does so while keeping folks safe during COVID-19.

We’re an army of data scientists hungry for data and passionate for new data science challenges. More information here:

About the authors

Óscar D. Lara-Yejas is Senior Data Scientist with the IBM Data Science and AI Elite and one of the founding members of the IBM Machine Learning Hub. He helps the worlds’ top industries put AI to work, whether it is in healthcare, finance, manufacturing, government, or retail, among others. Óscar holds a Ph.D. in Computer Science and Engineering from University of South Florida. He is the author of the book “Human Activity Recognition: Using Wearable Sensors and Smartphones”, and a number of publications on Big Data, Machine Learning, Human-centric sensing, and Combinatorial Optimization.

Shirley Han is a Data Scientist in IBM Technology Garage. She works on 6-week proofs of concepts and minimal viable products for clients in different industries through design thinking and co-creating with clients. Shirley studied Cognitive Science with a Specialization in Machine Learning and Neural Computation and also Computer Science at the University of California, San Diego. She is also a mentor for P-Tech students in Brooklyn.

Keely Wright is a Senior Technical Program Manager in IBM Data and AI, with over 15 years of software Development program and portfolio management expertise. Over the past few years, she has immersed herself in AI, and was among the first 50 IBMers to earn the Deep Learning Practitioner badge as a pilot member of the IBM AI Skills Academy. She now advises product teams in applying Watson and other AI technologies in their offerings. Keely has a degree in Electrical Engineering from Texas Tech University.

Arnon Houri-Yafin is the founder and CEO of Zzapp Malaria. As a result of their success, Zzapp Malaria was recently awarded as one of the three IBM Watson AI XPRIZE finalists. Arnon holds a B.A in Philosophy, Politics and Economy (PPE) from the Hebrew University of Jerusalem, and has taught statistics courses in that program. Arnon was also been involved in the development of Parasight — a machine vision-based device for malaria diagnosis.

--

--

Oscar D. Lara Yejas
IBM Data Science in Practice

Senior Data Scientist, IBM Machine Learning Hub. The opinions expressed are my own and don’t necessarily represent those of IBM. @larayejas