Global Terrorism Database 2016 and the Effectiveness of Exploratory Data Analysis
Recently, the Consortium for the Study of Terrorism and Responses to Terrorism (START) released the 2016 edition of the Global Terrorism Database. START, where I have been interning this summer as a researcher on the Unconventional Weapons and Technology team, describes the global terrorism database like this:
The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 170,000 cases.
It is a lot of data. It is grim data. It’s data with features like “number of people killed” and “target type”. And I have seen firsthand the labor-intensive process that is employed to collect and validate this data. Each row of the table is a, necessarily, lossy representation of a complex, tragic event. Terrorism- its consequences, manifestations, and causes- is one of the most difficult subjects that a researcher can engage. The GTD allows researchers to reason about terrorism in ways that transcend anecdotes and resist biases. To demonstrate what I mean, I did some quick exploratory data analysis on the new 2016 edition of the GTD. I hope to demonstrate how visualizing first, before you start down a path of more rigorous analysis, can inform and inspire your subsequent work. If you want to see how I did it, I’ve included the raw code at the bottom of this post.
What’s in the GTD?
One of the best ways to get a feel for a new dataset is to step back and evaluate how the variables are related. The plot above expresses the strength of the correlation between every pair of variables- for the newly added 2016 year- in terms of a continuous color map. In one elegant representation, thousands of rows of data are distilled into a tool for thinking with. It’s also instructive to compare a plot like this with the same plot of the previous year. Here, in the simplest of all data analyses, we already see how the GTD can be used to highlight significant changes in global terrorism trends.
We can also use simple visualizations to satisfy basic curiosities. Like, is terrorism seasonal?
A visual, exploratory process also enables us to contextualize new data in light of the historical record. Notice how each of these plots has the effect of calling to mind new questions. “Well that is strange. What on earth accounts for that?”
We can see how the attacks in 2016 are broken down by attack type and by target type. And if we find a puzzling correlation, it could be the beginnings of a new insight.
We can also use the data to gains some clarity on specific groups.
Accounting for Geography
The GTD features the approximate coordinates of each attack in the dataset. Unfortunately, my Python mapping skills are rather lacking at the moment. These maps were produced with the wonderful data visualization platform Tableau.
Let’s do some deeper exploration in one region. For the rest of this article, we will be using data on attacks in the United States.
Again, we can use the raw correlations to give us some ideas.
I noticed a fair amount of correlation between the
attack type and
target type variables. Plotting the data allows us to explore their relationship.
Here we arrive at one downside of exploratory data visualization. If categorical variables are encoded with integers, it can be difficult to ‘read’ plots that project one variable onto another. Because I am familiar with the GTD data, I happen to know that the correlation we see on the lefthand side of the plot is between “Armed Assault” and “Government”. But if you are staring down a new dataset, you will have to keep the codebook handy.
Finally, we see that it’s easy to return to a chronological view after we have narrowed down our scope to a single geographical region. There is nothing stopping us from further narrowing our scope if we so choose. For instance, we could see the number of attacks against military targets in the United States through time.
We can also ‘zoom in’ to see trends in a tighter band of time.
From Exploration to Analysis
I hope I have demonstrated why it is important to develop the skills that allow you to visualize the important data in your life. These visualizations are not just evidentiary products, they’re exploratory instruments. Visualizations can help you reason about complex phenomena, like terrorism, and pick up on trends that might otherwise have remained hidden.
If you enjoyed this post, let me know. If there is enough interest, I will do a line-by-line explanation of the python notebook that generated these plots. Let me know if you would like to see other articles like this one. For instance, would you be interested in seeing how deep neural nets could be used in policy research or to make policy decisions? It turns out that the GTD data is perfect for training a simple DNN Classifier.
You can contact me at email@example.com or @ryan_t_w