DisplaceNet: Recognising displaced people from images by exploiting their dominance level

Image for post
Image for post

Every year millions of men, women and children are forced to leave their homes and seek refuge from wars, human rights violations, persecution, and natural disasters. The number of forcibly displaced people came at a record rate of 44,400 every day throughout 2017, raising the cumulative total to 68.5 million at the years end, overtaken the total population of the United Kingdom

Objective

Currently, information extraction from human-rights related imagery requires manual labour by human rights analysts and advocates. Such analysis is time consuming, expensive, and remains emotionally traumatic for analysts to focus on images of horrific events.

In this article, we strive to reconcile this gap by automating parts of this process; given a single image we try to label the image as either displaced people or non-displaced people.

Problem formulation

Main idea

A person’s control level of a situation can be a notifying difference between the encoded visual content of an image that depicts a non-violent situation and the encoded visual content of an image displaying displaced people.

Our hypothesis is that the control level of the situation by the person, ranging from submissive / non-control to dominant / in-control, is a powerful cue that can help our network make a distinction between displaced people and non-violent instances. First, we develop an end-to-end model for recognising rich information about people’s emotional states by jointly analysing the person and the whole scene. We use the continuous dimensions of the VAD Emotional State Model [1], which describes emotions using three numerical dimensions: Valence (V); Arousal (A); and Dominance (D). Second, following the estimation of emotional states, we introduce a new method for interpreting the overall dominance level of an entire image sample based on the emotional states of all individuals on the scene. As a final step, we propose to assign weights to image samples according to the image-to-overall-dominance relevance to guide prediction of the image classifier

Model architecture

Components

  • Object Detection Branch: localise the boxes containing a human and the object of interaction using RetinaNet [2].
  • Human-centric Branch: VAD score for each human box & Overall Dominance Score that characterises entire image.
  • Displaced People Branch: Classification score for input image & re-adjust classification score based on overall dominance score

Getting the Data

Human Rights Archive is the core set of the dataset which has been used to train DisplaceNet.

The constructed dataset contains 609 images of displaced people and the same number of non displaced people counterparts for training, as well as 100 images collected from the web for testing and validation.

Setting up the System

The following dependancies are required to run this project:

  • Python 2.7+
  • Keras 2.1.5+
  • TensorFlow 1.6.0+
  • HDF5 and h5py (required if you plan on saving/loading Keras models to disk)

Before installing DisplaceNet, please install one of Keras backend engines: TensorFlow, Theano, or CNTK. We recommend the TensorFlow backend — DisplaceNet has not been tested on Theano or CNTK backend engines.

Then, you can install DisplaceNet itself.

Install DisplaceNet from the GitHub source (recommended):

Getting started

Inference on new data with pretrained models

To make a single image inference using DisplaceNet, run the script below.

DisplaceNet vs fine-tuned CNNs

Want to know more about DisplaceNet?

The entire code is also available on our GitHub repo & full paper available HERE.

References:

  1. Albert Mehrabian. Framework for a comprehensive description and measurement of emotional states. Genetic, social, and general psychology monographs, 1995.
  2. Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence, 2018.

Written by

Computer vision researcher

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store