Bayesian Analysis on Terrorism in South Asia

Najiyah Khan
3 min readFeb 17, 2017

In the below described project I used a Bayesian Analysis to see if a statistically significant relationship in the number terrorism attacks between Pakistan and Afghanistan.

Map of Afghanistan and Pakistan

The Data:

The Global Terrorism Database (GTD) is available through The Consortium for the Study of Terrorism and the Responses to Terrorism (START) at the University of Maryland. It contains a vast amount of fascinating information on terrorist attacks from 1970 through 2015 (data for 2016 will be available in the summer of 2017). It holds over 150,000 entries; they are classified by dozens of factors, including country, region, terrorist group, type of attack, and number of people killed. The GTD describes terrorism, “…as the threatened or actual use of illegal force and violence by a non‐state actor to attain a political, economic, religious, or social goal through fear, coercion, or intimidation.”

From 1970–2015, the following five countries experienced the highest number of terrorist attacks.

1 -Iraq 18770
2 -Pakistan 12768
3 -India 9940
4 -Afghanistan 9690
5 -Colombia 8077

With this information I pulled the data for Pakistan and Afghanistan (both are classified in the South Asian region in the dataset).

Number of Bombings and Explosions in Pakistan from 1970–2015

I created histograms depicting the number of bombings and explosions for each country from 1970–2015.

Given the turmoil and instability in the region, it is not surprising to see the large jump in attacks for the years after 2000.

The destructive presence of the Taliban, the War on Terror and the porous border between these 2 countries can be used as explanations for the increase.

Methodology:

My Bayesian approach was to create a prior with all bombings/explosions from 1970 through 2000 in region of South Asia. I used Pakistan and Afghanistan as my two populations. For these two countries, I used bombings/explosions from 2001 through 2015. Through EDA, I saw number of these types of attacks were similar for both country; spiking in the last years of the dataset (2010–2015). These attacks placed Pakistan second (with 12,768) in overall bombings/explosions and Afghanistan (with 9,690) fourth. Three (Pakistan, India and Afghanistan) of the five countries with the highest number of attacks are classified as South Asia in the dataset. Even though India is third (with 9,940) I believed the connection that Afghanistan and Pakistan may or may not have some estimation population difference.

Given the model results I alternative analysis. The means of both populations were high and closer together. Based on the graphs above, the mean standard deviation of both countries was very similar — they differed by .21. The difference between Pakistan and Afghanistan, given the prior I created, was insignificant.

Next Steps:

This analysis was the first part of an analysis that will include other countries in the region. I want to use India with Pakistan; but with larger timelines. In the analysis done above the 2 countries were given the timeline from 2001–2015. I would like to expand that for India and Pakistan. I also want to use the data for 2016 in both my initial Bayesian analysis and with India and Pakistan. Will the number of attacks in 2016 effect the results?Another step is to use Iraq and Afghanistan as the parameters to see if the Bayesian approach is more significant.

--

--

Najiyah Khan

Budding Data Scientist — the answers are in the data. #Python #WomeninTech #machinelearning. When coding errors threaten my mood I drink tea in my panda socks.