‘Semantic Generation Pyramid’ for Image Generation and Manipulation

Published in

SyncedReview

4 min readMar 17, 2020

Researchers from Google and the Weizmann Institute of Science have proposed a new image generative model that leverages the hierarchical space of deep features learned by pretrained classification networks and provides a unified and versatile framework for image generation and manipulation tasks.

Convolutional Neural Networks (CNNs) are powerful tools for learning meaning in feature spaces in visual classification tasks and can be trained to acquire semantic information ranging from low level to high level. However, there is no one-to-one mapping between deep features and an image, which makes it difficult to invert manipulated deep features back into realistic images. So far, this challenge has been addressed by imposing regularization priors on the generated image. But this technique can cause other problems, as it limits the type of features that can be used and the reconstructed images can become blurry and unrealistic.

The Google and Weizmann team applied Generative Adversarial Networks (GANs) to the task of feature inversion. GANs offer a distinct approach and proven ability to generate highly realistic images based on their game-theoretic formulation. GANs however struggle to utilize globally coherent semantic information encapsulated in deep features.

The researchers’ proposed “Semantic Generation Pyramid” is a novel generative model — a unified versatile framework for image generation and manipulation tasks which can leverage the continuum of semantic information encapsulated in deep features.

Given a set of learned features from a reference image, the model generates images with matching features at each semantic level. It can also generate only specified areas of an image. The model’s feature maps are multiplied with masks. In the training stage, a blocked crop is randomly selected as a “selected layer”. At inference time, the user can set any shape of the mask and determine the ”selected layer” according to the original input. Thus the generated images can keep only the wanted parts of the original images.

*Applying spatially varying masks, to generate only wanted areas of the image*

The architecture of the generator works in full conjunction with a pretrained classification model, where each classification stage has a corresponding block in the generator and each block corresponds to a single stage in the classification model (2–3 conv layers + pooling).

*Image Re-painting (left); Image generation from paintings (right)*

*Semantic Image Composition (left); Image re-labelling (right)*

The researchers introduced several applications to test the proposed framework’s versatility and flexibility: Re-painting, where an image region can be re-generated; Semantic image composition, which is implanting an object or some image crop inside another image; Generation from unnatural reference image, which is converting paintings to realistic photos; and Re-labelling, changing the class label fed to the generator. All tasks were performed with the same model without further training, and the results demonstrated the positive potential of utilizing semantic in generative models.

The paper Semantic Pyramid for Image Generation is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

To highlight the contributions of women in the AI industry, Synced introduces the Women in AI special project this month and invites female researchers from the field to share their recent research works and the stories behind the idea. Join our conversation by clicking here.

Thinking of contributing to Synced Review? Synced’s new column Share My Research welcomes scholars to share their own research breakthroughs with global AI enthusiasts.

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

‘Semantic Generation Pyramid’ for Image Generation and Manipulation

Written by Synced