Terror or Clickbait?

Jana Lehocká (Hošková)
17 min readJun 2, 2019

--

What is on the rise — terrorist attacks and their efficiency or just the amount of media coverage?

Authors: Jana Hošková, Barbora Špetlíková

Mentors: Romana Mrázová and Václav Karban from Accenture

Introduction

Hello, we are Barbora and Jana and we both attended Digital Academy organized by Czechitas. This blog was created to publish our final project and to walk you through our crazy journey of data analysis course. Our project is inspired by data journalism and its aim is to provide an interactive dashboard demonstrating how Czech media cover terrorism. But let’s start from the beginning…

As sociology and political science graduates, we have always been interested in society-wide problems. At the same time, we had the impression of lack of data analysis tools in the field of social sciences, where excel table is often the most powerful one. Therefore, we decided to find a topic which resonates in the public sphere and to study it using our newly acquired IT skills.

Selecting a project

Inspired by our SQL classes, where we learned our first SELECTs using a database of terrorist attacks, we realized that a great number of news we have read during last couple of years covered a threat of terrorism in Europe. Consuming the news about terrorist attacks, one could easily get impression that we live in the ‘age of terrorism’ fueled by Muslim radicals. But does the amount of news about terrorism correspond to the the actual risk of terrorism we face?

Media are powerful tools to shape public opinion, and selective data can result in wrong inferences and wrong decisions. Selective information can be in fact more harmful than no information or fake news as it gives us a wrong picture of reality. Taking this into consideration, we decided to show reality of terrorism through the lens of data. The perfect tool for this goal became data journalism, which combines traditional storytelling skills with range of digital information now available. Yet it still faces insufficient technical knowledge of journalists.

Identifying a use case

After selecting a topic of our project, we needed to decide what would actually be its outcome. As the project is based on data journalism, the outcome of the project was set as an interactive dashboard. We identified a data journalist as an ideal user of our project. The journalist is interested both in the topic which resonates in the public sphere and in the numerical data used in the production and distribution of information. Nevertheless, the general public interested in the topic, and especially students and social scientists, are other potential users of our dashboard. A dashboard user will receive comprehensive information on terrorist attacks and will be able to answer the following research question: What is on the rise — terrorist attacks and their efficiency or the amount of media coverage?

We determined to work with the time frame of 2013–2017. Analysing more than five years would require to scrape a very extensive dataset of media articles. Based on our preliminary research and consultation with selected media (iDNES.cz and ahaonline.cz), online media publish approximately 30 articles a day in the news section, which is approx. 750–900 articles a month and around 10 000 articles a year.

Once we had a user and a time frame, we planned respective stages of our project. As we decided to work with two data sources, we had two separate processes of getting data and their cleaning at the beginning. More information about getting data will follow in the next chapter. After that we conducted a text analysis of media articles, put the data together and prepared them for visualization so that the final product — a dashboard — could be created.

Getting data

We used two data sources for our project:

I. Global Terrorism Database (GTD)

We downloaded data on terrorist attacks from the Global Terrorism Database (GDT) created and published by University of Maryland and the GTD Codebook with data collection methodology and variables description. The GTD is an open-source database which includes information on terrorist events around the world from 1970 until the end of 2017. It is the most comprehensive database including systematic data on domestic as well as international terrorist incidents and it contains more than 180,000 cases. From the information that data offers, a comprehensive picture of the terrorist attacks could be created. The GTD Codebook defines a terrorist attack as the threatened or actual use of illegal force and violence by a non-state actor to attain a political, economic, religious, or social goal through fear, coercion, or intimidation.

II. Web scraping of media articles

In order to get data for our analysis, we decided to scrape the online media. Inspired by the Media map made by Josef Slerka, who created a typology of Czech online media, we chose following news sources:

  • mainstream media (in Czech hlavní proud) — iDNES.cz, novinky.cz
  • tabloid (in Czech bulvární média) — ahaonline.cz
  • political tabloid (in Czech politický bulvár) — ParlamentniListy.cz

Jakub Balada from Apify helped us with web scraping. He created a crawler to scrape articles from two servers (Novinky.cz, iDNES.cz) and explained us how the crawler works. We then adjusted and set the crawler using the instructions from Apify and downloaded the articles from the two other servers (ParlamentniListy.cz, ahaonline.cz). We received a total of 10 586 articles for our topic. See the crawler’s setting and the example of scraped articles in the pictures below.

Setting a crawler for web scraping at Apify’s platform
Articles successfully scraped from ahaonline.cz

Data cleaning and transformation

The terrorist database was in xlsx format. First, we read a Codebook to get to know what kind of information is in the data and what is its quality. We then cleaned the data in Python (we chose the columns we needed and replaced some characters) and converted them to csv. Afterwards, we uploaded CSV to Keboola and created the final data in the Sandbox using SQL. See the picture below.

Creating a final table of terrorist events in Keboola

The scraped media articles were in JSON format. We had four separated files that we cleaned, unified, merged and converted to Excel in Python. Each medium had used a different date format so we needed to unify them. Next, we created an article category (media section) from which the article came out. Last but not least, we were deleting ads that were at the end of some articles. The final document had the following columns: ‘article_id’, ‘source’, ‘url’, ‘category’, ‘name’, ‘date’, ‘opener’, ‘text’, ‘date_geneea’, ‘tags’, ‘place’.

Example of cleaning and creating the correct date format in Python

Text analysis

Once the data were prepared, the next step was to link the particular news article to a specific attack event. After consulting with our mentors and several Czechitas coaches, we realized that this was too ambitious goal for several reasons: There were around 60 000 terrorist events during 2013–2017. Some events were a series of attacks rather than a single assault. The article about terrorism might mention more than one event. Majority of attacks will not be covered by media, as we do not often read about terrorist attacks in Asia, Africa, or Middle East. Therefore, we decided to have a look at our media articles and to see which attacks were covered the most. We could then pick up the most cited attacks and link the article to the particular terroristic event.

For this part of our project we used Geneea’s NLP platform Frida. The software processes a given text and based on embedded algorithms, it identifies elements such as persons, locations, keywords and phrases. Before filtering the respective terrorist attacks, we used a Training file to tell the software how to read our data properly. For instance, we set “Islámský stát” as equal to “IS” or “ISIL”, and “Hollanda” as equal to “François Hollande”. We also added some new entities which would help us to filter the attacks and categorized them. Afterwards, we selected eleven terrorist attacks and filtered respective articles using Frida platform. We chose the attacks which were significantly covered by media and also included Volgograd and Istanbul as representatives of regions which usually does not deserve much attention in the news in order to compare them with the rest.

Analysed attacks:

To filter the attacks, we used Dates filter to make sure that no article is older than the attack date, as well as the key words and the custom entities we had added to the training file. Below you can see the print screen from Frida with filtered articles for Charlie Hebdo shooting, which occurred in January 2015. The last date for searching was set as 12 November 2015 to make sure that no articles writing about terrorist attacks in Paris in November 13–14 would be selected. We could control the search results by checking the most used keywords in the articles and by monitoring the Documents section containing all the filtered articles.

Print screen from Frida with filtered articles for Charlie Hebdo shooting

In the same way, we filtered the rest of the attacks (see below the key words from the four selected attacks).

Most frequently used keywords in articles about Nice, Boston, Manchester and Berlin attacks

We created groups of articles based on the particular attacks and downloaded 11 tables, one for each attack (see the table below). Once we had the tables with articles, we joined them with the table of terrorist attacks, and started to visualize.

Groups of articles selected in Frida

Visualization in Tableau

Here we would like to show and comment our findings using Tableau visualization tool. The interactive dashboard was published to Tableau Public, see the link Tableau public.

Dashboard 1: Map of Terrorist Attacks in 1970–2017

We took advantage of having an extensive data of terrorist events which goes back to 1970 and looked at the evolution of terrorist attacks in years 1970–2017 by creating an interactive time map. The user can look through the attacks depicted by a red dot. The size of the dot shows the number of victims. More details about the attack is displayed by clicking on the dot. For a video, click on the link Youtube.

Dashboard 1. Map depicting evolution of terrorism in time. For more details it is possible to click on red dot. Important note: Incidents of terrorism from 1993 are not present because they were lost prior to START’s compilation of the GTD from multiple data collection efforts. Several efforts were made to recollect these incidents from original news sources. Unfortunately, due to the challenges of retrospective data collection, it was not fully successful. All 1993 attacks were excluded from the GTD data to prevent users from misinterpreting the low frequency in 1993 as an actual count.

Dashboard 2: Terrorism in 1970–2017

Four charts show successful terrorist attacks in years 1970–2017 (excluding year 1993). The first two charts show that the number of terrorist attacks as well as number of victims is growing in time. Huge growth of attacks began in 2013. The peak was in 2014 when, for example, the war in Ukraine began and several wars, civil wars and conflicts took place in the Middle East and Asia. Huge amount of the attacks took place in Iraq. The number began to rise rapidly when the U.S. formally withdrew all combat troops from Iraq in December 2011. In the summer of 2014, the Islamic State of Iraq and the Levant (ISIL) launched a military offensive in Northern Iraq and declared a worldwide Islamic caliphate, eliciting another military response from the United States.

Dashboard 2: Terrorism in 1970–2017 Info: For more details it is possible to display only selected regions

Looking only at Europe, we see that the situation is completely different. Neither the number of attacks nor the victims are rapidly growing, but we can observe individual peaks. The most striking are years 2014 and 2015.

These peaks disappear when we remove the attacks on the territory of Ukraine, where the war began in 2014.

Most attacks in 1970–2017 took place in Iraq, Pakistan and Afghanistan (Middle East). Great Britain is the only European country which came to the selection of 12 countries with the most attacks.

The organization that is responsible for the most successful attacks is the Taliban, the second is the Islamic State of Iraq and the Levant (ISIL) and the third is a communist revolutionary organization Shining Path from Peru.

If we look only at Europe again, we can see that the most attacks were caused by the Irish Republican Army (IRA) and the Basque separatist group ETA. A rather interesting finding is that more attacks in Europe were committed by Kurdistan Workers’ Party than ISIL.

Dashboard 3: Terrorism in 2013–2017

Four charts show successful terrorist attacks in years 2013–2017. The first two charts show that in 2013–2017, the most terrorist attacks and their victims were in Asia and North Africa. The least number of attacks and victims were in Australasia and Oceania.

Dashboard 3: Terrorism in 2013–2017 Info: For more details it is possible to display only selected regions

The organization that is responsible for the most successful attacks is the Islamic State of Iraq and the Levant (ISIL) and Taliban.

The worldwide ratio of terrorist attacks in which no one has died or has been injured against attacks with victims is 3:7.

Interestingly, the European ratio is just the opposite.

Info: For more details it is possible to display the ratio for selected attack types. (The greatest efficiency has Assassination and the smallest Facility/Infrastructure attack.)

Dashboard 4: Map of Terrorist Attacks in Europe 2013–2017

The map shows successful terrorist attacks in Europe in years 2013–2017. The size of the red dot shows the number of victims. As we analysed articles in the Czech online media, which mostly focus on terrorist attacks in Europe, we decided to look closely at the situation in Europe.

Dashboard 4. Map of Europe depicting evolution of terrorism in time. For more details it is possible to click on red dots

Dashboard 5: Terrorism in Europe 2013–2017

Five charts show successful terrorist attacks in Europe in years 2013–2017. The first two graphs show that in 2013–2017 there were the most terrorist attacks and victims in Ukraine, due to the ongoing war. Great Britain is the second in the number of attacks but it is the fourth in the number of victims. France is the second in the number of victims. Russia is the third in both — the number of attacks and victims.

Dashboard 5: Terrorism in Europe 2013–2017. Info: Victims are the sum of killed and injured in an attack

We decided that we will also analyse the number of victims per attack. While overall, the most attacks and victims are in Ukraine, the highest number of victims per attack is in Belgium. That means that there were not many attacks in Belgium, but they were brutal and “efficient”.

The most efficient attack type is the Hostage Taking (Barricade incident) attack — an act whose primary objective is to take control of hostages for the purpose of achieving a political objective through concessions or through disruption of normal operations. The most efficient terrorist organization is the Imam Shamil Battaliona militant Islamist organization primarily active in the North Caucasus.

Dashboard 6: Trends in Media Coverage of Terrorism 2013–2017

By joining the data sets from the GTD database and media, we can see the trends in the media coverage of terrorism. As we focused on the media coverage of European attacks and the Boston attack, which attract more attention due to the geographical and cultural proximity to the reader, we compare the number of articles to the number of attacks which occurred in Europe and the USA. We can see that the amount of attacks in Europe and the US remain more or less the same, while the number of articles grows rapidly. When we look closely at the articles and victims per attack, we can see that the number of victims per attack has several peaks rather than continuously growing. The count of articles, on the other hand, increases in time.

Dashboard 6: Trends in media coverage of terrorism 2013–2017. Info: Are you wondering what happened in 2017 when the number of victims per attack reached a peak? Las Vegas shooting occurred in October 2017, where 58 people were killed and 851 people were wounded. Why we were not able to filter such a massive terrorist event? The reason is surprisingly simple. The event is labeled in the Czech media as a massacre in Las Vegas, omitting our key filtering words terrorist attack or terrorism. This example clearly demonstrates how catchy titles are used to attract the reader’s attention.

Dashboard 7: Selected Terrorist Attacks 2013–2017

Here we see that the attack covered by the media the most is the Charlie Hebdo shooting. At the same time, it can be seen that this attack claimed the least number of lives. The Nice attack, on the other hand, ranks only seventh in the media coverage despite being massively destructive in terms of number of victims. The third chart in the dashboard demonstrates 12 mostly used keywords in the articles (excluding words terrorism and terrorist attack). Overall, the most common keyword related to terrorism is the Islamic state (in Czech Islámský stát), followed by Syria (Sýrie) and war (válka). The chart also reveals that the word refugee (in Czech uprchlík) belongs to the group of top 12 keywords which is quite an interesting finding. We will closely look at this finding in Dashboard 9 dedicated to word clouds.

Dashboard 7: Selected Terrorist Attacks 2013–2017

Dashboard 8: Coverage of Attacks for Each Online Medium

If you would like to look closely at the coverage of attacks for each online medium, see the dashboard below.

Dashboard 8: Coverage of Attacks for Each Online Medium

Dashboard 9: Keywords

Are you wondering what are the most frequently used keywords the Czech media employ in their articles about terrorism? Let’s have a look. Words “terrorism” and “terrorist attack” were filtered out since it was obvious that they would be used the most. Again, Islámský stát (the Islamic state) ranks the first place in terms of frequency. Also, we can see that uprchlík (refugee) and migrace (migration) are frequently used words in the articles indicating that the topic of terrorism might be linked to another widely discussed subject in the Czech media — migration and European migrant crisis, sometimes also termed as refugee crisis. Indeed, if we take a closer look to the word cloud, we can see the keyword uprchlická krize (refugee crisis) represented in all the media. Moreover, the chart shows the frequent use of word Islam, džihádisté (jihadists) and náboženství (religion), confirming the tendency to link the issue of terrorism with Islamic religion and Islamic radicalism. These findings offer a room for further analysis.

Dashboard 9: Keywords

Dashboard 10: Bonus: Terrorism in the Czech Republic 2013–2017

Are you wondering what happened in the Czech Republic during these years?

2013

  • An Austrian foreigner detonated a suicide bomb in front of a block of flats in České Velenice. Only the bomber was killed in the blast.

2014

  • A letter bomb detonated at a pool manufacturing company in Orlov village. One civilian was killed and one was injured by the blast
  • An explosive device planted underneath the vehicle of a Russian-speaking driver detonated in Prague.
  • An assailant set a police vehicle on fire in Usti nad Labem. The Proletarian Solidarity claimed responsibility for the incident.

2015

  • Assailants threw incendiary devices at the house of Defense Minister Martin Stropnicky in Prague.
  • Assailants set fire to two police vehicles in Most. The Revolutionary Cells Network (SRN) claimed responsibility for the incident.
  • Assailants set fire to a police vehicle near a metro station in Prague. The Revolutionary Cells Network (SRN) claimed responsibility for the incident.
  • Assailants set fire to a police vehicle near a railway station in Prague. The Revolutionary Cells Network (SRN) claimed responsibility for the incident.

2016

  • Assailants threw Molotov cocktails at a refugee center in Prague. At least one person was injured in the attack. No group claimed responsibility for the incident, however, sources attributed the attack to Neo-Nazis.

2017

  • A train crashed into a fallen tree along the railway in Mlada Boleslav. An anti-immigrant extremist (a 70-year-old pensioner) left notes at the scene of the attack in order to make it look like the attack had been perpetrated by jihadists. He did this in order to “provoke a backlash against Muslims” and immigration.
  • Another train crashed into a fallen tree along the railway in Bezdez.
  • Assailants set fire to the Corpus Christi church near Guty.

These attacks are usually not associated with terrorism in the Czech media. The only exception is the case of ‘a 70-year-old pensioner and crashed trains’.

Dashboard 10: Terrorism in the Czech Republic 2013–2017

Conclusion

Our analysis revealed that the number of terrorist attacks in the world as well as number of victims is growing in time, with massive increase in 2013 and 2014 due to the wars and conflicts in the Middle East and Asia, but also in the Eastern Ukraine. In the Czech context, however, these events are not labeled as terrorist attacks but rather as war in Syria, war in Ukraine, civil war in Libya, war in Afghanistan. Therefore, we were not able to identify this events in our scraped articles even though the word Syria, for instance, was one of the most frequently used word in the news. We found out that Syria and Iraq serve as a sort of interconnecting points between the Islamic state and terrorist attacks in the news articles and most of the articles dedicated to the particular terrorist event in the West refers also to the Islamic state, war in Syria and refugee issues in the same piece of news.

In Europe, on the other hand, neither the number of attacks nor the victims are rapidly growing, but we can observe individual peaks. Comparing the amount of terrorist attacks in Europe and the USA, we can see that the amount of attacks remains static, while the number of articles is on a rapid rise. On the other hand, terrorist attacks became more brutal (efficient) in terms of victims ratio.

All in all, the total number of scraped articles was 10 586. The number of articles we used for analysing particulars attacks was 1 378, which is quite a small amount. So we were wondering what were the other articles addressing terrorism about. We found out that most of them speaks about terrorism in general, depicting it as a threat for Europe and warns against traveling to the popular tourist destinations, and also links it to the European refugee crisis. On top of that, we discovered some interesting facts in Big Criminal Horoscope. If you are Leo, the Archer or Aquarius based on signs of the zodiac, you are prone to become a terrorist. For those of you concerned, please be careful!

Since the GTD database is being gradually updated, this project can be later supplemented with the new data about terrorist events. In the future, it might be interesting to look at the quotations in the articles and to find out who comments on the terrorism the most and how he or she speaks about it. One might look for instance at the Czech politicians and check whether they use terrorist attacks and the fear of it to achieve political goals. To provide more comprehensive information about Czech media coverage, this pilot project might be extended by adding more media sources represented by all media types (see the Media map). Another interesting option would be to look at the social networks, such as Facebook and Twitter, and to analyse the communication and comments on the topic.

Thank you

Here we would like to thank the people around us who inspired us and motivated us throughout our incredible data analysis journey.

We would like to give special thanks to our mentors Romana Mrázová and Václav Karban from Accenture for their help, advice and inspiring early morning coffee meetings.

We also wish to thank Jakub Balada from Apify who helped us with web scraping.

Thanks should also go to Markéta Špinková from Geneea, who helped us understand how text analysis works and how to properly filter the information.

Thanks also to Radim Špetlík, who helped us with Python.

We cannot leave Czechitas without mentioning Pavel Chocholouš, who inspired us to choose the topic.

Many thanks to the Czechitas team and all our unicorn mates for making the Digital academy a truly special event!

Links for our dashboard and video

Tableau public

Youtube

--

--