IBM now helps stop Malaria spread with AI
AI models predict water body formation and mosquito growth
By: Óscar Lara-Yejas, Shirley Han, Keely Wright, and Arnon Houri-Yafin
Malaria intervention is shown to be the most effective way to save a human life. In 2019 alone, the disease caused over 400K deaths, more than by COVID-19 in the US in 2020. Nearly 90% of those deaths occurred in sub-Saharan Africa and sadly, most of them were children under the age of 5.
Zzzap Malaria is partnering with IBM to prevent the proliferation of mosquitoes transmitting malaria. Using AI, we are building tools to predict the formation of water bodies before they can become mosquito habitats. Health agencies can then use this information to optimize their efforts for larviciding: an ecologically-safe method to interrupt the development of larvae or pupa into adult mosquitoes.
Follow the mosquitoes… I mean, the water bodies
The deadliest animals on earth are neither sharks nor snakes but rather mosquitoes, with a terrifying record of over a million yearly human deaths worldwide. Out of over a dozen mosquito-borne illnesses, malaria is the deadliest.
Mosquitoes grow in smaller water bodies closer to villages and cities. Many of these water bodies appear and disappear over time following rainy/dry periods. Our approach is to analyze weather data and satellite images to predict the formation and evaporation of water bodies.
Zzapp Malaria
Zzapp Malaria is a startup company with the ambitious goal of eradicating malaria. They developed an AI-powered mobile application that allows for personalizing malaria elimination interventions in a number of African towns and villages.
As a result of their success, Zzapp Malaria was recently awarded as one of the three IBM Watson AI XPRIZE finalists. The $5M AI XPRIZE competition challenges teams across the world to demonstrate how AI can tackle global issues.
The data
Two data sources have been collected for this study, including (1) weather data from the IBM Weather Operations Center, as well as (2) satellite water body data from the Global Surface Water Explorer.
We’ve focused on three African cities where Zzapp is already operating: Zanzibar, Finote Selam, and Maputo. The three of them have different characteristics as Zanzibar is an island, Maputo is a coastal city, while Finote Selam is situated inland. Over 16 years of data in an area of more than 30,000 km² were analyzed in this study.
Weather data
The IBM Weather Operations Center provides over six petabytes of geospatial-temporal datasets worldwide, including satellite imagery and weather data, along with user-friendly query and aggregation tools to help businesses derive useful insights.
We collected hourly weather data from 2004 to 2019 within the target locations (i.e., Zanzibar, Maputo, and Finote Selam), including temperature, precipitation, humidity, dew point, wind direction and speed, among others.
Satellite images (water bodies)
The Global Surface Water Explorer (GSWE) is the result of a partnership among The European Commission’s Joint Research Centre, the UN Environment Program, and Google. It maps the location and temporal distribution of water bodies over the world for the past 35+ years, including statistics on their size and change. As the GSWE relies on satellite data, it only provides monthly snapshots of the water body distribution, which introduces limitations in the analysis.
But not all water bodies are relevant in the context of malaria intervention. In fact, mosquitoes carrying malaria only live in smaller water bodies, so a pre-processing step is to calculate water body sizes for the target regions.
The tools
Watson Studio offers end-to-end capabilities for every step of the model building process: data collection, curation, and exploration, as well as modeling and deployment. The platform includes visual tools such as AutoAI, which allow for building models in a matter of a few clicks, as well as the latest open source packages for the code-savvy data scientist. This article gives you an overview on why enterprises invest in data platforms in the open source era.
Watson Studio really accelerated every step in our AI project. From data preparation and validation to analysis and visualization. We were able to build models with an impressive performance in a very short time.
Arnon Houri Yafin, CEO, Zzapp Malaria
Counting water bodies
Water body data are retrieved in the form of TIFF images. Each pixel of the original image corresponds to an area of 28 m × 28 m and it represents whether that particular location has water or not (left image below). Missing data and land are considered no water (center image below) Now, the goal is to find clusters of adjacent water pixels which are the water bodies. This problem can be interpreted as a version of the counting islands problem, and can be solved by using the Depth First Search (DFS) algorithm.
Original images are 4000 × 4000 pixels, which is 111 km × 111 km, or 1° × 1°, and these are centered on their respective cities. For each image, water bodies are counted in 10km ×10km regions as in the right image above.
The figure below shows that there’s a correlation between precipitation (represented by the black line) and water bodies (represented by the red line). Our hypothesis is that precipitation, along with other weather variables, can be useful to predict the number of water bodies.
Predicting water bodies
Once we understood where water bodies have been, the next goal is to predict them in a given month and region. This is a spatiotemporal regression problem where features are from the weather data (measured hourly) and labels are the water body counts (measured monthly). To align these two data sources in time and space, weather data were aggregated in 10km × 10km regions on a monthly basis, computing aggregated features such as mean, standard deviation, median, min, and max.
Different regression algorithms were evaluated, including Linear Regression, Lasso, XGBoost Regressor, Huber Regressor, among others. Each model was also tested with and without Principal Component Analysis. Metrics like mean cross-validation score, K-fold CV average score, MSE, RMSE, and R² were evaluated to determine the best model for each city that we are observing. Our results indicate that Huber Regressor performed best for Maputo and Finote Selam, whereas Lasso was better for Zanzibar.
The figure above shows predicted versus actual number of water bodies in Maputo. The fit was very close for the second half of 2019. We encountered quite a few issues with the data granularity and integrity that affected the quality of the model for the first half of 2019, yet the water body count trend is correct in 83% of the cases.
Although this study was carried out with coarse open-source water body data, Zzapp is making great progress in collecting very granular spatial and temporal water body data from users’ reports and drone imagery. We strongly believe that replicating the same techniques we have covered on the more granular dataset will likely lead to even better results.
The top models for each city were deployed through Watson Studio, which provides an interface to put models in production as HTTP services. The models are accessible through APIs and the corresponding code is autogenerated in cURL, Java, JavaScript, Python, and Scala for being used from web or mobile clients.
How can I put AI to work?
“Working with the DSE team at IBM was a great opportunity... I learned a lot from their professionalism and attention to detail, and appreciated their ability to mix flexibility, curiosity, and real rigour in solving the challenge.”
Arnon Houri Yafin, CEO, Zzapp Malaria
IBM’s Data Science Elite team has been successfully helping enterprises adopt AI on their top use cases, in multiple industries such as healthcare, finance, retail, mining, manufacturing, among many others. Our remote data science engagement model does so while keeping folks safe during COVID-19.
We’re an army of data scientists hungry for data and passionate for new data science challenges. More information here:
About the authors
Óscar D. Lara-Yejas is Senior Data Scientist with the IBM Data Science and AI Elite and one of the founding members of the IBM Machine Learning Hub. He helps the worlds’ top industries put AI to work, whether it is in healthcare, finance, manufacturing, government, or retail, among others. Óscar holds a Ph.D. in Computer Science and Engineering from University of South Florida. He is the author of the book “Human Activity Recognition: Using Wearable Sensors and Smartphones”, and a number of publications on Big Data, Machine Learning, Human-centric sensing, and Combinatorial Optimization.
Shirley Han is a Data Scientist in IBM Technology Garage. She works on 6-week proofs of concepts and minimal viable products for clients in different industries through design thinking and co-creating with clients. Shirley studied Cognitive Science with a Specialization in Machine Learning and Neural Computation and also Computer Science at the University of California, San Diego. She is also a mentor for P-Tech students in Brooklyn.
Keely Wright is a Senior Technical Program Manager in IBM Data and AI, with over 15 years of software Development program and portfolio management expertise. Over the past few years, she has immersed herself in AI, and was among the first 50 IBMers to earn the Deep Learning Practitioner badge as a pilot member of the IBM AI Skills Academy. She now advises product teams in applying Watson and other AI technologies in their offerings. Keely has a degree in Electrical Engineering from Texas Tech University.
Arnon Houri-Yafin is the founder and CEO of Zzapp Malaria. As a result of their success, Zzapp Malaria was recently awarded as one of the three IBM Watson AI XPRIZE finalists. Arnon holds a B.A in Philosophy, Politics and Economy (PPE) from the Hebrew University of Jerusalem, and has taught statistics courses in that program. Arnon was also been involved in the development of Parasight — a machine vision-based device for malaria diagnosis.





