How to Teach Machines to Think Like Humans

Synced
SyncedReview
Published in
3 min readJun 10, 2017

Given the task of sorting rectangular blocks by colour, an 18-month-old child can perform well due to relational reasoning — an intellectual process that helps humans find inner relationships between separate items. While humans take such an ability for granted, researchers are patiently working to instill machines with the same capability.

On June 5, Google’s DeepMind released two new papers exploring how deep neural networks can perform complicated relational reasoning with unstructured data. The London-based artificial intelligence company introduced its Relation Network (RN) module and a Visual Interaction Network (VIN) that achieved superhuman performance in specific tests.

CLEVR is a visual QA dataset that requires models to perform tasks like counting, comparing and querying. In CLEVR, neural networks equipped with RN showed a better performance (95.5%) than humans (92.5%).

An illustrative example from the CLEVR dataset of relational reasoning. Credit DeepMind.

The ability to reason out the relationships between entities and their properties is central to generally intelligent behaviour. It distinguishes humans from most animals and machines, enabling for example a detective to piece together evidence from a murder scene and deduce the culprit; or a teenager to buy their friend a birthday gift that suits their style or interests.

In 2008, researchers began studying the contextual relations among entities in scenes and other complex systems. Symbolic approaches, for example, had been widely used in studying reasoning. They can define the relations between symbols using the language of logic and mathematics, and then apply versatile methods like deduction, arithmetic and algebra to reason out the relations between objects in a physical space.

However, relational reasoning that merely relies on symbolic approaches has proven difficult to develop. This kind of approach suffers from the symbol-grounding problem, which is related to the problem of how words get their meanings, and what those meanings are. Such symbolic approaches also focus mostly on small or synthetic data.

In recent years, there has been rising interest in extending neural networks to perform relational reasoning. The College of Information and Computer Sciences at University of Massachusetts Amherst drew attention last year for their paper pursuing relational reasoning in text and large knowledge-bases with neural networks.

DeepMind is taking a different approach with its Relation Network — a specialized, plug-and-play, flexible module primed for relational reasoning. DeepMind introduced Relation Network in Discovering Objects and Their Relations from Entangled Scene Representations, a paper submitted to ICLR 2017 in February. The paper studied how Relation Network can learn the relationship of objects from entangled scene inputs — a type of data input that can describe the features of various items with regard to size, location, colors and shapes.

The new DeepMind paper, A Simple Neural Network Module for Relational Reasoning, follows up on the research. To test if Relation Network is adaptive to different contexts, DeepMind added the RN module to existing neural networks to solve reasoning tasks involving visual questioning and answering, text-based questions, and complex physical systems.

“RN requires minimal oversight to produce its input (a set of objects), and can be applied successfully to tasks even when provided with relatively unstructured inputs coming from CNN [Convolutional Neural Networks] and LSTM [Long Short-term Memory],” says the paper.

Meanwhile, the DeepMind paper Visual Interaction Network (VIN) proposes a general purpose network that can predict the future state of a physical scene with dynamic movement — for example what will happen to a basketball after someone throws it into the air.

In contrast to RN, VIN employs a different type of reasoning concerning objects and physical interactions. It is able to infer the states of multiple physical objects from just a few frames of video, and use this to predict object positions many steps into the future. The paper did not disclose how far ahead the VIN can predict.

Although a machine that can classify items of the same colour or predict where a ball will end up after it hits a wall can hardly be regarded as intelligent, the approach does explore ways of teaching machines to think like humans.

Author: Tony Peng | Localized by Synced Global Team: Michael Sarazen

--

--

Synced
SyncedReview

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global