Using Localized Twitter Activity for Red Tide Impact Assessment in Florida

by Andrey Skripnikov and Nathaniel Wagner

Social media presents an invaluable tool when it comes to crowdsourcing in response to crisis situations. It allows for relatively seamless and efficient two-way communication between the governing bodies and the public, which, if harnessed properly, could yield benefits for both parties involved. Platforms like Twitter, Facebook, Instagram, Flickr were leveraged by local governments, police departments and response-planning agencies for such tasks as coordinating the relief efforts during disaster events (e.g. hurricanes, floods, typhoons, etc), tracking the spread of an epidemic, organizing neighborhood crime watch, just to name a few.

In this particular blog post we will focus on an environmental disaster event known as “red tide”. Red tide is a phenomenon resulting from algal blooms of the dinoflagellate Karenia brevis. Such blooms lead to water discoloration (hence the name “red tide”) and, more importantly, are extremely toxic to marine life and humans. Red tide generally results in large die-offs of marine life, with dead fish washing ashore and subsequently leading to closures of local beaches and businesses. Moreover, toxins are aerosolized out of the water column contributing to respiratory or skin irritation that can lead to hospitalizations. While the red tide phenomenon appears to be pretty localized in terms of its impacts — predominantly affecting the coastal states (with Florida being our study case here) — it may have reverberating effects throughout the entire country when it comes to such aspects as seafood supply chain.

Photo Credits: South Florida Water Management District (left), Dale White (right)

While not the longest duration red tide event on record, the severity and extent of the Florida 2018 red tide brought national attention to the impacted Florida Gulf Coast communities. It happened to be the first major red tide event since broad public use of social media platforms, offering potentially unique opportunities to assess complementary sources of information that can aid management response to disaster events. Residents and visitors turned to social media platforms like Twitter to both receive information, and communicate their own sentiments and experiences.

Work Done in Collaboration between New College of Florida’s Applied Data Science Master’s Program and Tampa Bay Estuary Program

In collaboration with the Tampa Bay Estuary Program, Nathaniel Wagner — student in the Applied Data Science master’s program at New College of Florida — and I were tasked with analyzing localized Twitter activity as a reflection of local red tide impacts, and also gauging public sentiment and most discussed topics around the disaster event. Twitter was the social media platform of our choice in big part due to its efficiency of communication (result of character restrictions imposed on tweet content) and the ability to share, or “retweet”, information from other users such that a single message can reach a much wider audience than the user’s immediate followers.

One of the biggest challenges in utilizing social media textual data for crowdsourcing purposes is being able to separate signal from noise. For that, we carefully double-checked the contents of tweets returned by our search queries, and subsequently performed several data cleaning tasks. One example of such a task would be disposing of tweets that only mentioned “red tide” as part of “red tide Rick” (nickname for Florida’s then-governor Rick Scott) or “red tide party” (reference to the Republican party), because those highly-politicized messages had much more to do with the upcoming elections rather than with the local red tide conditions at the time of the post.

Having cleaned the data, we wanted to see how well the Twitter activity correlated with the actual red tide conditions that were measured over time. Specifically, we used Mote Marine Laboratory’s Beach Conditions Reporting System to obtain daily data on respiratory irritation and dead fish levels across 12 beaches on Florida’s west coast, while NOAA’s Harmful Algal BloomS Observing System provided us with Karenia brevis cell counts in Florida’s coastal waters (see the Figure 1 below for locations of the cell count samples). That data was also categorized by locale, going from county-level (five counties of interest — Sarasota, Manatee, Pinellas, Hillsborough, Pasco) down to city-level, and further down to ZIP code level.

Figure 1: Karenia Brevis cell count sample locations.

Figure 2(A) below demonstrates the temporal dynamics of red tide from April 2018 to May 2019 on the Florida west coast. One could distinguish five peaks of red tide conditions, with the August and September peaks being the highest in terms of dead fish and respiratory irritation. Figure 2(B) shows the per-capita Twitter activity on the “red tide” topic in that same time period. One can see solid correspondence between peak Twitter activity and the five peaks of red tide conditions, with the August and September exhibiting highest numbers of tweets. By “explicit geo-tags” we mean the tweets where the “tweeted from” location is explicitly marked by the user, giving us more confidence in the tweet to actually come from that location. “All geo-matches”, on the other hand, also include tweets from users who simply mentioned that particular location in their user profile, which lets one know that the user resides in that area, but doesn’t necessarily guarantee the tweet to have come from that location. In fact, as one of the cleaning tasks, for tweets that had conflicting locations in the explicit geo-tag and user profile information (e.g. user lives in Manatee but tagged themselves in Pinellas), we would prioritize the explicit geo-tag information as the more reliable indicator of user’s location at the time of the tweet.

Figure 2: Temporal dynamics of red tide (A) and Twitter activity (B) on west coast of Florida, subsequently broken down by county (C).

Figure 2(C) illustrates the county-level correspondence between red tide conditions and per-capita Twitter activity emanating from each respective county. The correlation isn’t necessarily as strong as for the cumulative Florida west coast data from Figures 2(A) and 2(B), but it is still very respectable. Moreover, one can notice the advantages of using explicit geo-tags as opposed to also including tweets matched by the user profile information: there’s a higher correlation (0.79 vs. 0.72), with user-profile matches being strongly affected by Tampa users from Hillsborough (see the 2nd panel) actively tweeting about red tide unfolding in the neighboring areas, while not getting as much of a direct disaster impact themselves. Explicit geo-tags don’t suffer from that issue, because people mostly geo-tag themselves only when the red tide is actually happening where they are at currently. The sole reason we even considered the user-profile matches is because explicit geo-tags are generally tough to come by (only 1.5% of all posts on Twitter are geo-tagged) and we wanted to increase the pool of potentially relevant tweets.

We have also investigated the changes in temporal correspondence between Twitter activity and dead fish levels as one considers various spatiotemporal levels considered, with the resulting correlations depicted in Figure 3 below. There’s an unsurprising steady decrease in strength of correlations as we approach more hyper-localized and higher temporal frequency scales. Nonetheless, many of these correlations — especially on county-level, regardless of time frequency — are still respectable enough to show strong correspondence with the actual red tide conditions in respective localities over time, which by itself presents an interesting research finding.

Figure 3: Multi-level spatiotemporal correlations of local Twitter activity with dead fish levels observed in the area.

In addition, we’ve performed sentiment analysis on the tweets, hoping that it could improve the correlation between Twitter metrics and the local red tide conditions. We applied a relatively simple sentence-based sentiment scoring method from R’s sentimentr package, making sure to attribute negative sentiment to phrases indicating presence of red tide in the area, while also accounting for such aspects as negation (e.g. “there is no red tide here” would be scored positive due to presence of “no” term) or amplification (e.g. “extremely bad” to be scored more negatively than just “bad”). After having assigned the sentiment scores to all the tweets, we also conducted “sanity check” by taking a random sub-sample of 1,000 tweets and verifying if the assigned sentiment scores were reasonable in indicating red tide presence in the area. Unfortunately, these efforts didn’t amount to any tangible improvements in correlations over using regular tweet counts from Figures 2(B,C) and 3, which could be attributed in part to a pretty simplistic sentiment analysis methodology.

Lastly, we also conducted topical summary of tweets, categorizing contents by types of concerns that locals expressed at the time of the disaster. Table 1 below shows red tide-related tweets to contain by far the largest number of mentions corresponding to environmental concerns (talking about beaches, fish, water, climate, pollution, etc.). Health issues (citing respiratory issues, toxicity of red tide, etc.) place a distant second, and economic concerns (discussing damages to local businesses) third, with both topics being brought up consistently nonetheless. This topical ranking falls in line with the results from Li et al, 2015, where the newspaper coverage of the 2005–2006 red tide event was found to predominantly focus on the associated environmental risks (80% of stories mentioning those), followed by health concerns (48%), and economic impact was covered the least (30%). Ability to answer such questions can in turn have management implications for improving communication about red tide, both general education and real time conditions, and for prioritizing management actions, such as marine debris cleanup, targeted assistance to waterfront businesses and their employees, or distribution of personal protective equipment (PPE).

Table 1: Most cited concerns on Twitter during Florida red tide event of 2018, by category.

What Did We Learn and What’s Next?

Main takeaways from our analysis of the Twitter activity during the Florida 2018 red tide event were:

1. Strong spatio-temporal correlations between red tide conditions and public response on Twitter. In particular, tweets explicitly geo-tagged to come from the area were shown to have better correspondence with the local conditions than tweets matched by user’s profile information only. Advantages of explicit geo-tags get even clearer for more fine-grained localities, such as county-level, city-level, or ZIP code areas.

2. Strength of correlations was confirmed to gradually decrease for more hyper-localized scales (e.g. city- or ZIP code level, as opposed to county-level), and higher temporal frequency (daily, every three days, as opposed to weekly)

3. Sentiment analysis didn’t appear to have improved on regular tweet counts when it came to the strength of the correspondence between Twitter metrics and red tide conditions data.

4. Among issues cited by the public throughout the disaster, environmental concerns were heavily prevalent (with 90K+ mentions), health being a distant second (28K) and economy third (12K).

As one may observe, there’s promising signs of using Twitter as a crowdsourcing tool in monitoring and planning for such a disaster event as red tide, which has resulted in our work getting published in the “Harmful Algae” academic journal, along with being acknowledged in the local media (see “Florida Scientists: Social Media Can Track Toxic Algae” article in “Bradenton Herald” and “Government Technology”).

With that in mind, there are some clear areas in need of improvement (e.g. using more sophisticated sentiment analysis techniques) along with other research questions that need answers in order for Twitter insights to be of true utility to local governments and planning agencies. On that note, in the near future New College of Florida intends to further collaborate with the Tampa Bay Estuary Program to design a public-facing web dashboard that, among other potential features, tracks recent Twitter activity on the topic of “red tide”, getting a pulse of public’s specific experiences, requests, questions, and concerns as it pertains to that issue at a certain point in time. Such dashboard could serve to further inform the response of the local planning agencies during the next red tide event of large magnitude, improve the communication between the public and the government, and, last but not least, increase public awareness about this critical environmental issue.

Andrey Skripnikov is an Assistant Professor of Statistics at New College of Florida, conducting research on time series, variable selection and sports analytics.

Nathaniel Wagner is Senior Developer in Cienaga Systems. He graduated from New College of Florida with a Masters in Data Science in 2021.

--

--

Andrey Skripnikov
Applied Data Science @ New College of Florida

Andrey Skripnikov is an Assistant Professor of Statistics at New College of Florida, conducting research on time series, variable selection, sports analytics.