Using Wikipedia to Forecast the Flu
With the help of Wikipedia, researchers are predicting disease outbreaks — and hitting the mark.
By Sara Del Valle and Dave Osthus
Flu season is coming. It’s easy to dismiss the flu as an inconvenience — a miserable inconvenience, to be sure, but something you just get through and move on. But in reality, it’s a serious disease with serious consequences. According to the Centers for Disease Control, seasonal influenza strikes millions of the U.S. population each year and sends about 200,000 people to the hospital. Lost earnings and direct medical costs top $26 billion, and the total economic burden edges close to $90 billion a year. No one knows exactly how many people die from the flu in the United States each year, but estimates range between 3,000 and 49,000 deaths.
Forecasting the impact of not just the flu, but other infectious — and preventable — diseases such as HIV and measles could allow public health workers to focus on mitigation strategies and potentially save millions of lives around the world.
That’s why, at Los Alamos National Laboratory, we use mathematics, computer science, statistics and information about how disease develops and spreads to forecast the flu season and even next week’s sickness trends.
We make our forecasts with the help of official and internet-traffic data. For years, health officials have tracked the spread of the flu by monitoring reports that doctors have filed with their state health departments. In the United States, for instance, the CDC gathers these reports about influenza-like illness and distributes the information back to the health-care community. By then, the information is weeks out of date. Recognizing a problem, the disease modeling community turned to the internet for a solution.
We and the rest of our team at Los Alamos went to work and discovered that as flu was breaking out, people around the world were reading Wikipedia articles on influenza and related subjects. So the freely available Wikipedia access logs provided real-time footprints of the flu marching through a population. Working with data from historical flu seasons, we then compared the Wikipedia access logs to reports on illnesses for the same periods from the CDC. The number of people reading flu articles went up at the same time people were getting sick. We then borrowed well-tested mathematical methods of prediction used in climatology and other fields, applied them to the data from the Wikipedia access logs and created real-time weekly forecasts for the last three flu seasons.
The result? Forecasts that have been largely on target. For example, for the 2015–2016 flu season, we predicted it would be late (not peaking until February) and mild (fewer instances of flu). The CDC numbers bear this out.
We’re now beginning to predict the peak and severity of this flu season. A later peaking flu season looks slightly more likely than an early one, but at this stage in the season, a lot of uncertainty makes it hard to say. There’s no crystal ball when it comes to predicting disease outbreaks. Instead, there’s a range of plausibility. Each week, as more information becomes available, the forecasting model is updated and the range of plausible forecasts modified and constrained.
Forecasting a flu outbreak could prevent a full-fledged epidemic by giving officials time to launch a broad-based vaccination program and widespread communication strategy, such as encouraging people to wash their hands and stay home when they are sick. On a personal level, that might spare you a few miserable days. Globally, it could save lives. (And don’t forget: get your flu vaccine if you haven’t already!)
Sara Del Valle is a mathematical and computational epidemiologist in the Information Systems and Modeling group at Los Alamos National Laboratory and leads the team forecasting influenza. Dave Osthus is a statistician and lead investigator on the project.
Originally published at www.huffingtonpost.com on November 15, 2016.