Getting insights from political violence and protests around the world — The geospatial ramp-up

Jan Tschada
Geospatial Intelligence
11 min readAug 27, 2021
Aggregated non-US ACLED events 2000 — 2020

To achieve political goals, we have seen people and governments using political violence. It includes violence between governments, e.g. war as an intense armed conflict, and violence used against non-state actors, e.g. police brutality. A rebellion is a kind of politically motivated violence of non-state actors against a government. The storming of the United States Capitol was a riot act from non-state actors against the United States Congress.

Political violence is a major public health problem, not only because of the direct consequences but also because of the long-term effects on health and well-being of our communities. As a geospatial intelligence engineer, you need access to high-quality open source intelligence (OSINT) data collections for various geospatial tasks. We are going to inspect one high-quality OSINT data collection and use their geospatial information for spatial data science and political violence mapping.

Professor Clionadh Raleigh from the University of Sussex and her co-authors introduced the Armed Conflict Location & Event Data (ACLED) Project in a paper being published by the Journal of Peace Research in 2005. The paper shows the importance of disaggregated data collections using local specific high-quality information over location and time. In the past, researches often used national-level data only, which leads to erroneous conclusions. As the ACLED webpage claims:

“ACLED is the highest quality and most widely used real-time data and analysis source on political violence and protest around the world.”

Ahead of the US election 2020, the ACLED data collection covers the United States and recently expands its coverage to at least 37 European countries. We implemented a geospatial intelligence toolset, which offers spatial aggregations like spatial binning using all ACLED events being tagged as politically violent. The US Crisis Monitor is a special project collecting US-related event data and extends the ACLED coverage in 2021, finally. The proposed geospatial ramp-up uses the non-US related ACLED events.

Important note from the ACLED webpage:

“ACLED’s real-time data updates are paused for all regions of coverage through the end of August 2021. Data for the period of 31 July to 3 September will be released on 6 September, at which point real-time data publication will resume.”

The GEOINT toolset walkthrough

The Living Atlas is hosting a web map showing the historical and a near real-time layer of the ACLED events. By using the official ACLED API, you can directly consume the raw event data and update your layer or geospatial dataset of choice. For offline processing, you can download the event data as CSV file by using the ACLED Data Export Tool.

ACLED Data Export Tool

The walkthrough starts by downloading the raw CSV file for analysing the geospatial and temporal coverage of the non-US related ACLED events. By inspecting the raw data, we easily noticed the different collecting sources and the impact of different data acquisition and data processing methods. For example, the spatial resolution of the event data varies significantly. This is because of the differences in data collection and data processing techniques.

We used Python for prototyping, especially by preparing the data for mapping. With the help of spatial binning, we aggregated the events and get a better understanding of the spatial patterns. During this preprocessing phase, we used a combination of Python modules and Jupyter notebooks.

The spatial binning of point data is a kind of point-in-polygon aggregation. We implemented a special case by using a column-wise algorithm for creating only rectangular bins. Whenever we try to access the spatial bins by using an index method, we must take care of this internal design decision.

ACLED as the single source of truth

The ACLED project offers an easy-to-use web-based dashboard linking to the regularly updated event data collection. This dashboard allows filtering the events by:

  • Event Date
  • Event Type
  • Region
  • Fatalities
  • Actor Type
  • Interaction

The dashboard contains a map widget for visualizing the events geographically. The user can switch between different maps and change the geographical representation of the events by using the original location as a point layer or the locations aggregated into countries as a polygon layer.

ACLED Full Dashboard

Reading the ACLED events using a data analysis library

The data export tool supports CSV files and Excel spreadsheets. We saved the exported event data CSV files directly into the filesystem. The CSV file contains the non-US related ACLED events from 2000 to 2020, consumes 316 MB of disk space and represents 657 296 events.

We read the CSV file using an open source data analysis library named pandas. Pandas offers reading of files into a two dimensional relational data frame. A data frame is a labeled data structure with columns as series being capable of holding specific known data types like strings, integers, floats, dates or well-known complex types.

Usually, we implement custom complex types and register these types by decorating the underlying data access into the Pandas ecosystem. this time, we did not implement a custom complex type, instead; we used an already defined geospatial type. This geospatial type manages different geometry representations like point, line, or polygon. You should also implement various conversion functions between the already known data types. At runtime, the data engineer and/or the workflow apply these conversion functions on existing data frames. This extension mechanism is a really neat way to address the needs of new domain specific areas.

Understanding the structure of the ACLED event data collection

The ACLED project collects the way disorder occurs using event-based information. The disorder includes a range of activities ranging from serious political violence, such as targeted attacks on civilians and military clashes, to spontaneous demonstrations, mass arrests and deliberate destruction of property. In principle, the procedure systematically depicts the event-based disorder. The used taxonomy allows data analyst’s being very flexible when comparing the recorded events and doing domain specific feature engineering for further analysis.

Geospatial analyst reviews are a useful way to do geospatial analysis, especially in OSINT related workflows. We recommend that, as part of the preprocessing step, geospatial analysts focus on the features as the geographic representation of the underlying entities and their internal relationships. One of the simplest, but not the most common, tools include event identification and event classification. A geospatial analyst can also create, edit or transform the collected event data by spatial joining various geospatial data using multiple sources.

Understanding the ACLED data structure

The ACLED data collection models six man event types and 25 sub-event types. The main event types are briefly explained below.

Battle Events
A battle is a violent confrontation between at least two armed interest groups. The specialized subtypes of battles differ, whether as a result the pre-establishment of a geographical region has changed, whether a non-state group has taken control of a region, or whether a government has regained control of a region.

Battle Events 2000 having one Hot-Spot: Burundi region
Battle Events 2020 having two Hot-Spots: Crimea and Nagorno-Karabakh

Remote Violence Events
Remote violence refers to events in which an actor used an explosion, a bomb, a landmine, an improvised explosive device, a mortar or a drone strike. Often, irregular forces carry out these types of violent events. The remote violent events represent an one-sided violent action. Usually the target does not have the possibility to defend itself against.

Remote Violence Events 2000 having two Hot-Spots along the border of south Sudan
Remote Violence Events 2020 having two Hot-Spots: Crimea and Syria

Violent Events against civilians
Violent events against unarmed civilians involve events when an actor hurt or punishes someone, because they have hurt them or someone else, in revenge was a plausible motive for the attack. Another violent event is kidnapping when an actor carries someone away by force and/or detaining a person against his or her will. Any form of sexual violence is a special type of this category.

Violent Events against civilians 2000 having two Hot-Spots: Sierra Leone and Burundi
Violent Events against civilians 2020 having three Hot-Spots: Nigeria/Camerun, Rwanda/Burundi and Syria/Iraq

Riot Events
Riots are situations in which a crowd engages in violence and/or destruction on the streets or in other public areas. Usually two or more groups fight against each other, or only one group is acting and voluntarily causes destruction. Violent unorganized mob often causes violence eruptions by executing spontaneous violent actions during demonstrations.

Riots 2000 having one Hot-Spot located in Kenya
Riots 2020 having one Hot-Spot: Israel/Syria

Protest Events
Protests are non-violent forms of opposition to a particular type of situation. An attempt is being made to influence public opinion or to enforce the desired change with a direct action. A protest itself can sometimes be the subject of a counter-protest, in which case the actors of the latter support the positions against the original protest. Besides the peaceful protests, the data collection represents two other types or protests in this category. Whenever some actor intervenes, the protest and captures some protesters or when an actor uses an excessive force against the protesters.

Protests 2000 having one Hot-Spot located in Senegal
Protests 2020 having two Hot-Spots: Syria and Pakistan/India

Strategic Developments
Strategic developments include the activities of violent non-political motivated groups. They usually encompass many uncommon events that are not obviously related.

Strategic Developments 2000 having one Hot-Spot located in Sierra Leone
Strategic Developments 2020 having one Hot-Spot located in Yemen

Mapping the ACLED events using the Spatial Data Frame

A Spatial Data Frame or the enhanced version named Spatially Enabled Data Frame extends a Data Frame with spatial capabilities. One of the most important capabilities is mapping the underlying data by using the plot function in a Jupyter Notebook environment. The plot function needs a ready-to-use map widget and adds the data as a new operational layer.

When using the latitude and longitude coordinates directly, the plot function adds the events as point features into the newly created operational layer. A feature is the geographic representation of a real-world entity at a specific point in time. Thus, we recommend using one feature per event. Loading more than half a million features into a web-based Jupyter map widget is not the best course of action. Therefore, we used the ArcGIS Runtime for Qt and rendered the point features using the Qt Quick based map control.

ACLED Event Features rendered using ArcGIS Runtime for Qt

Aggregating the ACLED events using Spatial Binning

Spatial binning aggregates a bunch of point geometries into a pre-defined spatial grid structure. The spatial grid comprises distinct cells having a two-dimensional geometry (e.g. rectangle or polygon). We created a simple rectangular grid factory supporting the WGS84 and Web Mercator spatial reference. The spatial grid offers a point in grid aggregation. As a result, the aggregation contains the number of points intersecting each grid cell in an attribute called hit count.

We used the calculated hit count of the aggregated events and visualized the rectangular cells with a class-breaks renderer having five classes. The map widget offers different classification methods when creating the renderer. We created the five classes by using the natural breaks algorithm and used the cool-warm color-map option. A grid allows high efficient region queries and is an efficient way to aggregate the ACLED events.

ACLED Map Widget hosted in a Jupyter Environment

Implementation details we think we should mentioned

The GEOINT Python module wraps the ArcGIS API for Python and addresses some special cases we discovered during the implementation. By implementing the workflows in Python only, you hit some not so easy to fix performance issues. We are used to implementing the performance critical parts of our internal geospatial core utility libraries in C++ with pybind11 and recently in Rust with PyO3. The main reason we did not use our best practices was because we wanted to show the challenges of a Python only approach.

We constructed the spatial grid using a Feature Set containing rectangular polygon geometries. The constructed geometries use the Web Mercator spatial reference. This is the most common spatial reference for cloud-based basemaps. Before aggregating the ACLED events with the spatial grid, we needed to project the event locations from WGS84 into Web Mercator. We implemented the projection of the point geometries directly in Python. You could also use a pay-as-you-go cloud-based Aggregate Points location-based service offer, but you have to publish a hosted feature layer for the events and another one for the spatial grid as a prerequisite.

Web dashboards for getting insights and where to go from there

We implemented our own custom ACLED Web Maps and neat looking interactive ACLED Web Dashboards. With this simple geospatial approach, we can recognize historical events, such as the Burundian Civil War and the Nagorno-Karabakh War, geographically without being an intelligence expert.

Interactive ACLED Web Dashboard

We know that a Python only approach is a kind of tough requirement, which seems to be unusual. But we have seen so many data scientists running in this direction lately. Whenever someone is moving in this kind of uncomfortable direction, we advise them to take a step back and just listen to what Wes McKinney told us about Apache Arrow and the “10 Things I Hate About pandas” — Wes McKinney. By the way, the refactoring of the Python-only implementation was a pain in the neck. You can inspect the details of our Python refactoring by taking a closer look at the following issue:
Efficient usage of the Geometry Services

Using cloud-based geospatial services works fine as long as you prepare your geospatial data using hosted feature layers. Whenever you only want to process the raw geometries, you should not use bulk requests transferring over 1000 features per call. Otherwise, you will easily hit the service limitation. Keep in mind, these geospatial services processing raw geometries are designed for scaling-out and being responsive. In contrast, the location-based services are operating on hosted feature layers and are optimized for long-running geospatial workflows. Each call creates an unique job, you can determine the job status and the job result asynchronously. We have implemented the GEOINT Python module mainly to give non-spatial experts a better understanding of the implementation details and because there is always a need for an offline fallback to support low-bandwith and disconnected environments, too.

Due to the fact that the ACLED data is not always updated in near-real time, we will introduce you other OSINT related news sources, enrich these events of interest and combine them to high-level knowledge graphs. We will try to explain this in further blog posts.

What do you think?

Do we need more of these publicly available GEOINT OSINT Tools and why do we need those?

References:

--

--