Google’s DeepMind develops AI that predicts 3D layouts from partial images

HAMZA ABDULLAH
THE 21st CENTURY
Published in
3 min readJun 23, 2018

Google’s DeepMind has created neural network capable 3D rendering from just few raw images.

According to Google’s official DeepMind blog, the goal of its recent AI project is to make neural networks easier and simpler to train. Currently, even the highly advanced AI-based recognition system uses large data sets comprised of images which are human annotated. Which makes it much expensive process because every scene is needed to be annotated by a human.

Google’s DeepMind in its recent publish work, introduced the Generative query network (GQN) a framework within which machines learn to perceive their surroundings by training only on data obtained by themselves removing the dependency on humans. Much like a little kid, the GQN try to learn by itself through the observations of the world or environment given around it. the GQN learns about plausible scenes and their geometrical properties, without any human labeling of the contents of scenes.

Generative query network (GQN) Google’s DeepMind

The GQN model is composed of two parts: a representation network and a generation network.

Representation Network Representation network takes the agent’s observations as its inputs and then describes the underlying scenes as the representation. Generation network then predicts the scenes based on the observations. Representation network does not know that what generation network will ask it to predict the scenes from the observations given by it. So, the representation network needs to be accurate as much as possible in observing the scenes. GQN exhibits the following important properties:

The GQN’s generation network can ‘imagine’ previously unobserved scenes from new viewpoints with remarkable precision.

The GQN’s representation network can learn to count, localise and classify objects without any object-level labels.

The GQN can represent, measure and reduce uncertainty. It is capable of accounting for uncertainty in its beliefs about a scene even when its contents are not fully visible, and it can combine multiple partial views of a scene to form a coherent whole.

The GQN’s representation allows for robust, data-efficient reinforcement learning.

Using GQN substantially more data-efficient policy learning, obtaining convergence-level performance with approximately 4 times fewer interactions than a standard method using raw pixels being observed.

Well, GQN’s performance as shown by the DeepMind is quite promising, which shows the promising results towards the autonomous scene understanding.

Tesla’s head of AI Andrej Karpathy also discussed the challenges involved in training the company’s Autopilot system in a recent talks at TrainAI in may 2018. Musk also mentioned that its upcoming all-electric super car — the next-generation Tesla Roadster — would feature an “Augmented Mode” that would enhance drivers’ capability to operate the high-performance vehicle.

And Google’s DeepMind GQN model is a promising approach towards the autonomous understanding. Which is expected to improve its performance over the availability of data sets and the improvements in the hardware department which will then enable the GQN’s ability to perform also in the real world scenarios.

For further reading..

Support us on Patreon

--

--

HAMZA ABDULLAH
THE 21st CENTURY

Driven by a futuristically optimistic vision, I am dedicated to transforming society through innovation, striving to become a Type 1 civilization.