Summary: Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction

Pouya Pezeshkpour
2 min readOct 18, 2018

--

Authors: Roei Herzig, Moshiko Raboh, Gal Chechik, Jonathan Berant, Amir Globerson

In this paper they address the problem of graph scene prediction from images, which can be seen as a structure prediction task for capturing objects relation in an image:

One of the ways to solve the structure prediction task calls score-based methods, is defined as finding optimum labels which reflect the input better according to a defined score function S(X, Y), where X is the input and Y is the labels. To optimize this scoring function we need to define the graph labeling function F: Z → Y, where Z is ordered set of nodes and edges’ features. As a result, we can define the permutation invariant characteristic of the task as equality between the output of function F on permuted input and the permuted output with the same permutation. Accordingly, the most important contribution of this work is to find a sufficient and necessary condition for permutation invariant characteristic as follows:

Then by representing the above three functions as linear neural layers, they introduce a new model for the structure prediction task which achieved a state of art result on some metrics in Visual Genome dataset. The overall model is provided below:

--

--