Data Science Project Helps Legionnaires’ Disease Investigators Worldwide
Spring 2021 MIDS Capstone award-winning project ‘TowerScout’ automatically detects cooling towers from aerial imagery.
Legionnaires’ disease is a severe pneumonia with a 1 in 10 fatality rate, and cooling towers containing Legionella bacteria are a common source of community outbreaks. Outbreak investigators need to find and test cooling towers to stop the outbreak and prevent more illnesses and deaths.
TowerScout is a tool for Legionnaires’ disease investigators who want to find cooling towers from aerial imagery. As UC Berkeley Master’s of Information and Data Science students Jia Lu, Gunnar Mein, Thaddeus Segura, and Karen Wong trained neural network models to identify cooling towers in an area that a user searches through a web interface.
We interviewed the MIDS team to learn more about the project that won the Spring 2021 Hal R. Varian MIDS Capstone Award.
What inspired your project?
Karen: As a medical officer with the Centers for Disease Control and Prevention, I’ve worked on several disease outbreaks over my career including variant influenza, Ebola, and COVID-19. I know firsthand the importance of working as fast as possible during an outbreak. Legionnaires’ disease can cause deadly outbreaks, and outbreaks are often traced back to cooling towers that can harbor Legionella bacteria. But in the U.S., very few jurisdictions have a dataset of all their cooling tower locations, so when an outbreak occurs, investigators manually look at aerial imagery to identify nearby towers. The first day of an outbreak investigation might be spent doing this. A tool that can identify cooling towers quickly and accurately from aerial imagery means the investigation team can take steps to stop the outbreak faster, preventing more infections.
What was the timeline or process like from concept to final project?
Thaddeus: We were fortunate as a group that Karen came into Capstone with this project in mind, so once teams were selected, we were able to work immediately. However, we had to start from scratch as there was no pre-existing labeled dataset for this type of problem. Once we had labeled training data, we were able to get a baseline model established quickly so that we could begin gathering feedback from the CDC. We adopted a user-centered approach early on, so the end-state user experience for the CDC helped guide every decision we made.
We conducted multiple rounds of user testing with CDC Legionella investigators to understand the features and functionality they needed to make this a tool that they would actually want to use for an outbreak. While developing the UX & UI, we continued to gather more training data and fine-tuning the model, and ultimately introduced a second stage model to improve the model performance further. As we completed the technical work, we shifted our focus to user-onboarding and developed multiple video walk-throughs, and pulled it all together on a website to act as a one-stop-shop for new CDC users.
How did you work as a team, and as members of an online degree program?
Thaddeus: Working as a team was actually very easy because everyone on the team was so talented and motivated! To manage the work, we created a roadmap in Week 1 and aligned the work according to each individual’s strengths, but we modified the roadmap frequently as we learned more from our users. We met twice a week throughout the duration of the semester to stay aligned and on track, and we relied heavily on Google’s collaboration tools (Colab, Slides, Docs, Drive) to keep our work organized and accessible.
How did your I School curriculum help prepare you for this project?
Gunnar: W207 (Applied Machine Learning) and W266 (Natural Language Processing with Deep Learning), in particular, gave us a really good idea about the neural network technology we ended up using for this project. Really, the curriculum fed into many aspects of what we did — from hosting object detection and secondary classification in Python on AWS, over critically assessing our evaluation metrics and available data, to presenting to our stakeholders at the CDC, and getting meaningful feedback from them.
Do you have any future plans for the project?
Karen: We have some immediate opportunities to make this work visible to our target audience. TowerScout will be introduced at a national workshop for Legionnaires’ disease investigators. We’ve also been invited to present our work at the Council for State and Territorial Epidemiologists’ annual meeting. We are preparing a manuscript describing this work for publication in a peer-reviewed journal, where we hope to reach a broad public health audience — including the global public health community.
Gunnar: The project is already available as open-source, but like any organically evolved system, it would benefit from some cleanup and generalization. Cooling towers are far from the only thing one can find from an aerial view.
We want to make sure that people feel encouraged to use our code for further research, and we look forward to what they will do with it.
How could this project make an impact, or, who will it serve?
Karen: CDC has already used TowerScout to save valuable time in their Legionnaires’ disease investigations. This summer, we’ll be introducing TowerScout to frontline public health workers in state and local health departments. By publishing our work and sharing our code, we also hope that Legionnaires’ disease investigators worldwide will be able to adapt this technology to their needs.