Every year millions of men, women and children are forced to leave their homes and seek refuge from wars, human rights violations, persecution, and natural disasters. The number of forcibly displaced people came at a record rate of 44,400 every day throughout 2017, raising the cumulative total to 68.5 million at the years end, overtaken the total population of the United Kingdom
Currently, information extraction from human-rights related imagery requires manual labour by human rights analysts and advocates. Such analysis is time consuming, expensive, and remains emotionally traumatic for analysts to focus on images of horrific events.
In this article, we strive to reconcile this gap by automating parts of this process; given a single image we try to label the image as either displaced people or non-displaced people.
A person’s control level of a situation can be a notifying difference between the encoded visual content of an image that depicts a non-violent situation and the encoded visual content of an image displaying displaced people.
Our hypothesis is that the control level of the situation by the person, ranging from submissive / non-control to dominant / in-control, is a powerful cue that can help our network make a distinction between displaced people and non-violent instances. First, we develop an end-to-end model for recognising rich information about people’s emotional states by jointly analysing the person and the whole scene. We use the continuous dimensions of the VAD Emotional State Model , which describes emotions using three numerical dimensions: Valence (V); Arousal (A); and Dominance (D). Second, following the estimation of emotional states, we introduce a new method for interpreting the overall dominance level of an entire image sample based on the emotional states of all individuals on the scene. As a final step, we propose to assign weights to image samples according to the image-to-overall-dominance relevance to guide prediction of the image classifier
- Object Detection Branch: localise the boxes containing a human and the object of interaction using RetinaNet .
- Human-centric Branch: VAD score for each human box & Overall Dominance Score that characterises entire image.
- Displaced People Branch: Classification score for input image & re-adjust classification score based on overall dominance score
Getting the Data
Human Rights Archive is the core set of the dataset which has been used to train DisplaceNet.
The constructed dataset contains 609 images of displaced people and the same number of non displaced people counterparts for training, as well as 100 images collected from the web for testing and validation.
Setting up the System
The following dependancies are required to run this project:
- Python 2.7+
- Keras 2.1.5+
- TensorFlow 1.6.0+
- HDF5 and h5py (required if you plan on saving/loading Keras models to disk)
Before installing DisplaceNet, please install one of Keras backend engines: TensorFlow, Theano, or CNTK. We recommend the TensorFlow backend — DisplaceNet has not been tested on Theano or CNTK backend engines.
Then, you can install DisplaceNet itself.
Install DisplaceNet from the GitHub source (recommended):
$ git clone https://github.com/GKalliatakis/DisplaceNet.git
Inference on new data with pretrained models
To make a single image inference using DisplaceNet, run the script below.
$ python run_DisplaceNet.py --img_path test_image.jpg \
--hra_model_backend_name VGG16 \
--emotic_model_backend_name VGG16 \
DisplaceNet vs fine-tuned CNNs
Want to know more about DisplaceNet?
- Albert Mehrabian. Framework for a comprehensive description and measurement of emotional states. Genetic, social, and general psychology monographs, 1995.
- Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence, 2018.