Using Satellite Images to Assess the Environmental Impact of Forced Displacement and Conflict in Somalia

Random forest classifiers with a 99% accuracy score for detecting environmental changes following forced displacement.

Fred N. Kiwanuka
Omdena
7 min readNov 13, 2019

--

This work is part of Omdena’s AI challenge in partnership with the UN Refugee Agency. The objective was to understand whether there is any interrelation between climate anomalies, displacement and human conflict in Somalia using satellite images.

Results summary

Using Landsat 8 images with Random forests classifier we achieved a 99% accuracy score for detecting environment changes as a result of forced displacement in Somalia. We show that by using satellite imagery of Somalia, the image classification models are able to depict the effect of forced displacement as a result of violent conflict, to the environment through internal displaced human activity.

Monitoring the progress of wars and conflicts, the damage they produce on different aspects such as the environment, and infrastructures, as well as the extent of the humanitarian crisis they lead to, can be a complicated task given the danger that they present for being on the ground to collect data.

Satellite imagery is a key source of information to both monitor and map the progress and impact of conflicts without the risks and costs associated with having people on the ground.

Somalia is one such place where satellite imagery is handy in assessing the impact of conflict over time in order to verify potential damages. For the last couple of decades, Somalia has been ravaged by conflict.

The data

The collected Landsat 8 images of the Banadir region each have 8 bands. Banadir region had the largest number of arrivals between 2016 and 2017 as a result of conflict and drought. Landsat 8 images are readily available from several sources like USGS EarthExplorer.

All the 8 bands were used for our feature engineering. The training image LC08-L1TP-163058–20150118–20170414–01 was taken on 1/18/2015. It was chosen because of its low cloud cover. Clouds make satellite image classification modeling problematic. Indeed even with a threshold of 10% cloud cover over the time period from 2008 to 2019, only about 23 images were generated for Banadir Somalia. Besides its low cloud cover, the image was considered for training because there was relative stability in 2015 and less conflict in Somalia.

The test image LC08-L1TP-163058–20170107–20170312–01-T1, was taken on 01–07–2017. It was chosen because 2017 was the second most conflicting month in the last decade after 2013 in Somalia.

Supervised classification

Supervised classification requires a ground truth or labels. Shapefiles from OpenStreetMap were handy in providing the ground truth. The labels for the training came from Open Street Maps, which provides crowdsourced land use data. We used the shapefile of Mogadishu the capital city of Somalia which is within the Banadir region. The labels were troublesome and required preprocessing. The open street data of Mogadishu contained 7 classes that are: buildings, land use, waterways, railways, points, nature, points, and roads. The shapefile data highly constituted of buildings contributing about 93% of the class data, land use contributing 6% and the rest less than 1%. This could lead to a strong over-classification of other classes as buildings.

Classes Distribution

During the classification, to address the class imbalance, the land use and other labels are interpolated as an area not labeled as a building. The training was carried out over a small fraction of the image clipped from the whole scene and cross-validating against the rest.

Classification with random forests

An exploratory analysis revealed that spectral signatures of each class are easily separable and distinct. This was actually the basis of going for the conventional machine learning model like random forests.

Classes Separation

Basic steps for data preparation and modeling

· Create a dataset object containing all bands, create a mask and extract pixel values to obtain geospatial polygons.

· Clip the image to reduce the computational burden. The actual images without clipping were 7321 by 7431 in size.

· Compute the pixels from the raster as outlined in the shapefile.

· Reproject the shapefiles to latitude and longitude of the raster data

· Train the model on clipped data using Random forests.

· Predict the labels on clipped data

· Predict for the rest of the image

The results

We achieved 99% accuracy on the prediction for both classes. The confusion matrix here reveals that only a handful of pixels were misclassified.

Confusion Matrix

Visualization of the classification

We show here the result of the visualization of the classification next to the RGB image of the scene. The left column image represents the trained image classification while the right is the test image classification. This comparison is necessary to see whether there are changes over time.

The brown color code represents buildings while the green represents land use.

Classification results visualized

It’s clear that during the conflict the pressure to utilize the land for housing increased. This makes sense because, in times of conflicts, people move to urban centers like Mogadishu. According to United Nations data, the majority of internally displaced Somalis end up in Mogadishu (Banadir region).

Arrivals by region

The effect on the green belts of the city is clear to see. Also, it can be argued that drought is a cause of these many arrivals in Banadir. Exploratory data analysis from another task group in the challenge showed that most people who arrived in Banadir came from Lower Shabelle.

Departures by region

The main reasons given for the departures for Shabelle are drought and conflict.

Reasons for leaving

Further analysis using Normalized Difference Water Index (NDWI) and NDVI revealed even more interesting patterns. NDWI is similar to NDVI but for identifying water. Areas that are high on the NDWI ratio are generally classified as water and areas high on NDVI are vegetation. It's clear that there is some decrease in the NDWI between 2015 and 2017. This could mean either an increased demand for water resources as a result of the internally displaced people arriving or rivers dying up.

NDWI and NDVI: training image and result
NDWI and NDVI: Test Image

Testing unsupervised classification

Furthermore, we performed an unsupervised classification algorithm, k-means clustering. The choice of the optimal number of clusters is still a research topic but in this task, we chose the number of clusters to be 5.

The below results show that K-means clustering is capable of distinguishing buildings well in the same way as the supervised classification approach. But supervised does better with some tuning. However, it's clear that clustering is able to confidently label one or more clusters and these could be used as the labeled pixels for generating a training data set for a supervised classification modeling for areas in Somalia that don't have ground truth. Indeed less than 10% of Somalia has been mapped on OpenStreetMaps.

Kmeans Clustering

Conclusions of this task

In this task, we have been able to use Landsat 8 images classification of Somalia to show the effect of forced displacement as a result of violent conflict, to the environment through internal displaced human activity. Improvements in this study are possible to understand further interrelations of the aspects of displacement, climate anomalies, and conflict in other regions. It would be interesting to study the regions of the largest departure like Shabelle. Another area of improvement would be labeling.

My Experience with Omdena

I had taken a break from day to day data science work at UNICEF to do other things. A friend sent me a link to Omdena AI challenges. I was accepted on the Omdena AI for their social good program. My first challenge with Omdena in collaboration with the UNHCR Refugee Agency was quantifying the influence of climate anomalies on displacement & human conflict using satellite imagery and other sources of data. My background in image processing and analysis was mainly in the health sector whose images do not exceed 100,000 pixels. Now here I was dealing with over 50 million pixels images.

It has been a very interesting journey and pleased with my first output. Meeting so many collaborators following diverging backgrounds and location working on one challenge as a family has been so rewarding. No matter how experienced or skilled in AI, you will always learn something new from this family of collaborators working for social good.

Want to become an Omdena Collaborator and join one of our tough AI for Good challenges, apply here.

--

--

Fred N. Kiwanuka
Omdena
Writer for

I am data science enthusiastic who is interested in applying AI and Machine learning to solving problems in the developing World