Development of a tool for disaster impact analysis and prediction

Nienke Adegeest
Journey to gaia
Published in
3 min readOct 29, 2018

Elaborating on our previous article about the cluster and sentiment algorithm tool, which plots tweets about a certain subject (e.g. the earthquake in Lombok that took place August 05, 2018) according to location, topic and sentiment, we developed a complementary tool. This tool involves an algorithm that relies on tweets labeled by our partner company effect.ai. These labels allow us to train the algorithm to categorize tweets into topics. This specific tool was applied on 24,458 tweets about the situation in Lombok, of which 499 were labeled. The resulting topics were “victims”, “type of disaster”, “aid”, “emergency response”, “mapping”, “corruption”, “donations”, “aid failure” and “scandals”. Data engineering showed that the topics “victims” and “type of disaster” were most frequently used as a label. More specifically, these labels were overrepresented in the data, causing the labeled dataset to be skewed towards these labels, thereby overshadowing other labels that may provide a more telling classification of the data. By not taking into account the labels “victims” and “type of disaster”, the classification algorithm showed an accuracy of 50% of classifying the right topic. To improve this result, we zoomed in on the topics “victims” and “type of disaster”, and concluded that (1) the topic “type of disaster” proved useless for training of the algorithm as its outcome can be derived from the context; and (2) it could be useful to divide the topic “victims” into several subtopics, i.e. “impact”, “nr of deaths”, “survivors”, “support” and “tourism”. After due consideration, we decided to continue our analysis with the topics “impact of disaster”, “aid failure”, “external aid”, “local initiatives”, “emergency response”, “donations”, “mapping”, “scandals” and “corruption”.

The topics are now used and applied to a new dataset (the hurricane Michael in Florida, October 07, 2018) to continue training the algorithm. In addition, with regard to the earthquake and tsunami in Sulawesi (September 28, 2018), we analyzed the corresponding tweets in order to investigate whether it is possible to answer the following questions based on Twitter data:

  • Generally, what is the impact of a specific (natural) disaster?
  • What type of help is needed at what location?
  • Does the supply and demand of help correspond with each other?
  • Can we predict what type of help is needed after a specific (natural) disaster?

The findings of this investigation and training of the algorithm may help us carry out disaster impact analysis and prediction based on Twitter data.

Regarding the first question stated, we found a correlation between Twitter activity and specific aspects of an event. For example, with regard to the three cases discussed (Lombok, Florida and Sulawesi), the results of our analysis show an accurate representation of when the earthquakes and its aftershocks/hurricane took place (tweet activity was significantly high at these moments in time). Also, we developed a measure of the impact of the disaster, which takes into account the occurrence of the words “victims”, “deaths”, “injured”, “missing” in tweets, the length of tweets, the re-tweets and the amount of tweets per day. In addition, based on the tweets, we are now able to map the type of disaster that has taken place.

Stay tuned for more updates!

Special thanks to data scientist Carlijn Bakker from Young Mavericks and machine learning specialist Jasper Adegeest from millennials.ai

--

--