Tweets, Posts, and Searches: Digital surveillance is the next horizon in healthcare analytics
Introduction
For more than a decade, Twitter has been validated through academic studies to be able to track and predict the spread of a variety of infectious diseases and non-communicable diseases. However, government agencies and policymakers still rely heavily on traditional measures and analyses to track and predict health conditions. Furthermore, Twitter data mining is a successful tool to use in tracking and predicting health outcomes, but it is also often quicker at identifying those outcomes. The COVID-19 pandemic has highlighted the need for early and accurate prediction of disease outbreaks. Given the importance of timely identification of outbreaks, epidemics, and pandemics, Twitter should be incorporated more seamlessly into the healthcare analytics process. Examining the current successes of Twitter data mining in predicting health conditions and infectious diseases will further highlight the need for and importance of Twitter as a tool for disease surveillance.
Infectious Diseases
Avian Influenza
The case for using Twitter data mining to track and predict health issues and infectious diseases is not something new coming out of the COVID-19 era. For example, a study analyzing 209,000 Tweets discussing avian influenza from July 2017 to November 2018 was able to identify the date, severity, and virus type via Tweets alone. A remarkable 75 percent of avian influenza outbreaks were identifiable from data mined from Twitter. Notably, one-third of all outbreaks could be identified via Twitter earlier than official reports of the outbreak were made available.
Zika Virus
Similar to the avian influenza study, another study used time and geotagged Twitter data to track the 2016 Zika virus outbreak within the United States. The model successfully predicted an influx in Zika cases one week prior to actual case spikes. While the Twitter model had less accurate data in areas with very few reported cases and areas with large outbreaks, the model was more accurate than alternative models used in public health surveillance. This example makes a case for Twitter data mining in conjunction with other surveillance models to help narrow findings and better prepare for outbreaks.
Ebola Epidemic
Before both the avian influenza study and Zika virus studies, Twitter data mining was used to monitor and track the Ebola epidemic of 2014. A study analyzing Tweets from 2011 to 2014 was able to alert to the epidemic in December 2013, three months prior to the official announcement of the Ebola epidemic. The analysis focused on two main symptoms of Ebola, fever, and rash, and concluded that the model could be successfully replicated for other diseases and illnesses by updating the symptoms analyzed. Notably, this study focused on countries that experienced the worst Ebola outbreak, including Liberia, Guinea, and Sierra Leone. These countries have lower internet usage and mobile subscribers, yet the model still holds, showing that Twitter data mining can be successful even within areas of low internet usage.
COVID-19
Given these previous studies’ success, Twitter data mining was also used to track COVID-19 outbreaks globally. One study tracked both symptoms and preventative measures on Twitter to determine outbreaks in the United States and Canada. In Canada, 83 percent of COVID-19 waves were detected at least one week earlier than official reports. In the United States, 100 percent of outbreaks in states that experienced the initial COVID-19 outbreak in March were identified, and 78 percent of outbreaks in states that did not experience the initial outbreak but did experience subsequent outbreaks were able to be identified. Identifications occurred one to two weeks prior to official reports. Notably, this study found that Twitter reported trends in COVID-19 cases earlier than Google searches did at the beginning stages of the pandemic. Furthermore, Twitter was exceptionally better at tracking discussions of preventative measures compared to Google searches. Importantly, accurate discussions of preventative measures often result in fewer cases of COVID-19 within a community.
While the above study used symptoms such as cough and fever to determine COVID-19 outbreaks, another study used anti-science views, political ideology, and misinformation about the COVID-19 pandemic to determine where future outbreaks were expected. Data mining from 27 million Tweets by 2.4 million Twitter users from January 21, 2020, to May 1, 2020, was able to successfully predict early outbreaks of COVID-19 in Mountain West and Southern states. Findings from this study may also help determine how communication on Twitter can help reduce future outbreaks, combat misinformation, and provide accurate preventative information.
Non-communicable Diseases and Health Conditions
Heart Disease
Beyond infectious diseases, Twitter has been used to predict instances of non-communicable diseases and other health conditions. For example, a 2015 study used hostility and chronic stress markers, both well-known risk factors for heart disease, to help determine instances of heart disease within counties across the US. Tweets containing language reflecting negative relationships and emotions, disengagement, and anger were found to be associated with areas with increased risk of heart disease. Furthermore, Tweets displaying positive relationships and emotions and engagement were found to be protective factors. As has been a trend across Twitter-focused studies, analyzing Twitter language was better at predicting heart disease mortality than a model that combined 10 common demographic, socioeconomic, and other risk factors often used in clinical settings.
Asthma
Asthma impacts over 25 million individuals within the US and results in over two million ER visits annually, with a particularly stunning impact on young children and adolescents. However, current asthma surveillance systems often have data lags up to two weeks, making it nearly impossible to gather information necessary for timely interventions for communities and individuals most impacted. However, a study using Twitter data and Google searches predicted the number of asthma-related emergency department visits in certain areas. The Twitter and Google data, in combination with environmental sensor data, was able to predict in near-real-time the number of asthma emergency department visits with 70 percent accuracy. This study proved the timely health data information that Twitter provides regarding both populations and individuals and highlights the need for Twitter data mining to be used in tandem with existing measures.
Suicide
Twitter has been used to track non-communicable diseases such as heart disease and asthma; it has also been used to track mental illness and instances of suicide or suicide attempts. A 2014 study filtered 1,659,274 Tweets over three months and identified 37,717 that suggested the author was at-risk for suicide. Tweets were determined to be “at-risk” based on keywords or phrases stemming from clinically-based suicide risk factors. The study concluded that states that had higher rates of suicide, namely midwestern and western states, were found to have increased instances of individuals sharing at-risk Tweets on Twitter compared to states with lower rates of suicide. Not only did this study help to determine areas and states that experienced spikes in suicide, but it was also able to predict individuals at risk for suicide, proving again that Twitter can be a population health and individual health level tool.
Gun violence
More recently, Twitter has examined public health and population health concerns such as gun violence. Dr. Desmond Patton has studied how social media posts, including those on Twitter, can predict community issues. His findings have gone so far as to conclude that a specific combination of just two emojis each can represent aggression, substance use, and grief and loss. He further noticed that Tweets indicating grief and loss were followed just two days later by instances of violence, often instances of retaliation among rival gangs. Dr. Patton has proven that Twitter and social media data mining can improve public health by recognizing instances of violence before they occur, interrupting such instances, and providing resources for grieving communities and communities troubled by substance use.
Conclusion
An analysis of 2,582 scholarly articles published from 2009 to 2020 which utilized Twitter as a data source concluded by describing Twitter as “an increasingly popular data source, and a highly versatile tool for health-related research.” The study noted that since 2015, publications using Twitter as a data source had been rapidly increasing. From food safety and bullying to cervical cancer and anxiety, Twitter has been used to analyze all aspects of public and population health with remarkable degrees of success.
Utilizing Twitter as a tool for health data analysis shows no signs of slowing down. Twitter is increasingly being utilized to study all facets and areas of public health and disease prevention. Twitter can find public health trends quickly, accurately, and with great precision. The benefit exists to large populations and communities and individual neighborhoods and individuals. The data available from Twitter is astounding, and it is beyond time to integrate Twitter data into healthcare analytics processes as a reputable and standardized tool for improving global public health.