Big Data: The Flu and You

Big Data is constantly collecting your information, and sometimes for your benefit. Still, some uses of Big Data have been known to backfire, especially when taken out of social context.

Several years ago, Google started providing projections on the influenza saturation rate in the United States. Through Google Flu Trends, they provided the public with projections of the proportion of the population that was infected with the flu. The numbers are mostly based on the number of flu-related queries (or searches). Google’s estimates are compared to that of the Center for Disease Control (CDC) to conclude that from 2008–2010, their numbers were quite accurate. However, in the interval 2011–2013, Google’s projections were consistently higher than the CDC’s. In fact, Nature reported in February 2013 that Google’s projection was double that of the CDC!

A graphical representation of Google’s overestimation of flu saturation (Source)

Clearly, Google’s projections using data gathered about flu-related queries started out accurate but quickly became very incorrect. Researchers suggest that increased media coverage instigated flu-related queries by completely healthy people, thus pushing Google’s projections to unprecedented levels.

As to be expected, this dangerously high estimation caused a bit of a panic (because few knew that this was, indeed, an overestimation). This panic then brought more people to Google search for flu-related queries, thus causing Google’s projections to drive even higher. This, hence, created what Cathy O’Neil calls a “feedback loop” in her book Weapons of Math Destruction.

The data from 2004–2015 is still publicly available (I put the United States’ data into an accessible format here). The data reports that February 2013 saw a surge in flu-related queries, further evidence of the feedback loop. Again, we don’t know the nature of these queries or why they were made — only that they are flu-related. This data is lacking in context in the social climate of the time.

In Danah Boyd’s and Kate Crawford’s book Critical Questions for Big Data, they say that “There is value to analyzing data abstractions, yet retaining context remains critical … Managing context in light of Big Data will be an ongoing challenge.”

(Source)

Evidently, from 2008–2010, Google had a good understanding of the context and reasons for flu-related queries. But starting in 2011, the social context of these queries changed as a result of drastic changes in the media and its coverage of the flu. When Google did not account for this change in context, their data failed to accurately inform on flu saturation. Hence, we can see that the social climate in which data is collected is paramount to our understanding of that data.

--

--