The future of AI: from siloed data to shared expertise

Philippe Beaudoin
Element AI
Published in
7 min readFeb 19, 2018

--

Every corporation that has successfully deployed artificial intelligence solutions has done so by installing a very deliberate internal culture of sharing data. Take Google’s example. In 2011, the Mountain View giant had more than seventy different privacy policies for their various products. The following year, they aggregated most of these into a new main privacy policy, explaining what information was collected and how it was being used by all Google products.

It’s no coincidence that this move took place in 2012, the year deep learning really started picking up. Back then, every AI researcher was discovering the following adage:

The greatest killer of AI is siloed data.

That’s because deep learning, the key technology behind the current AI revolution, is known to be very data hungry. By keeping the data for each of their products siloed, Google couldn’t unleash deep learning on the aggregate of their rich datasets, which was a problem.

Diversity matters

Breaking down silos doesn’t only increase the amount of data on which AI can be trained, it also increases the diversity of this data. Different products perceive users in different ways, and these different viewpoints open up a world of possibilities. For example, a clever engineer may combine the datasets from Google Fit and Google Music to learn which music is best suited to different activities.

AI can do the same, only better. Where humans can find a few connections, an AI can identify thousands of them. A machine learning system let loose on these combined datasets would therefore be able to discover many unforeseen correlations that have the ability to improve each of these products.

The curse of secrecy

Given that huge and diverse datasets are required to train modern machine learning systems, how can corporations with smaller or more specialized datasets benefit? The prospects are grim. Indeed, an analysis of the Big Data business model, in which corporations allow their data to be collected by huge aggregators, shows that this approach mostly benefits giant tech companies.

Total data secrecy might sound more appealing, yet this is also a recipe for failure. A corporation building a wall between its data and the rest of the world would also be blocking out the many benefits of deep learning.

Representation learning to the rescue

Fortunately, progress in the field of deep learning research hints of a future where every corporation could benefit from AI. This research direction has a name: “representation learning”.

Representation learning is anchored in deep learning’s ability to understand data on different levels. Let’s take a quick look at how this happens.

Deep learning relies on neural networks, which are systems made of layered units. Each of these units, called perceptrons, are very simple things. They receive a bunch of input signals, sum them up in a weighted fashion, and if the sum is large enough, they fire a signal to the perceptrons on the next layer.

Once built, neural networks need to be trained on a specific task, for example determining if an image contains a cat. Training ensures that each perceptron learns the weights to apply to the signals coming from their friends in the previous layer.

We’ve become really good at training neural networks in that we can easily create a network where the perceptrons on the final layer say “yes” only when an image contains a cat. What we’re not so good at, however, is understanding and controlling what happens between the input and the output of this network.

Learning a tangled mess

Let’s look at our cat detector a little more closely. In this type of neural network, the input to the perceptrons on the very first layer is image pixels. After training, an analysis of the perceptrons on that layer shows that they fire when a region of the image contains edges at different angles. The first layer of the neural network has learned to detect object boundaries!

The first layer of AlexNet learns to detect very simple shapes.

Good job first layer! Now let’s try to guess what the deeper layers have learned. Maybe the second one is able to identify a combination of two lines (i.e. a corner); perceptrons on the third layer might fire when they spot a frequent shape like a circle or a rectangle; deeper still, we may find perceptrons that are able to identify an eye or a nose.

In practice, though, these deeper layers rarely learn to detect such neatly categorized components. Their outputs correlate with the content of the image in a way that’s seemingly random. It all gets magically sorted out somehow, and the network still performs well on the task it was trained for. However, unless we provide some extra details, the components that are detected by deeper layers do not seem to carry any meaning for us poor humans.[1]

Worse, components detected by deeper layers are not necessarily meaningful for other neural networks trained on different tasks. In machine learning parlance, we say that perceptrons on deeper layers are “entangled”, hinting at the fact that meaningful components are somehow encoded by the output of multiple perceptrons in a way that cannot be easily teased out.

From a technical standpoint, a deep neural network is not a “black box”, because we can see which perceptrons are firing at any time. Yet our comprehension of these systems is limited, as we don’t really understand what it means for most of the perceptrons to fire. By learning to detect meaningful components, representation learning will therefore help us understand not only what goes on in the first layers (the edge detectors), or on the last layer (the cat detector), but everywhere in between.

It’s still challenging to design networks and algorithms that learn good representations. In fact, it’s a very active field of research and every year the International Conference on Learning Representation (ICLR) examines new breakthroughs. If you’re interested in the nitty-gritty details, this paper by Bengio et al. is a good starting point.

Better representations mean better understanding

A system that can learn meaningful components at every level is a system that has a better understanding of the task it’s tackling. This pays off in a number of ways:

  • Understandable machine learning. The system is more understandable to a human analyzing its internal behaviour.
  • Domain adaptation. The system can be trained on different but related tasks using a smaller amount of data.
  • Transfer learning. The system can share its understanding with other systems tackling tasks in which similar components are important.

In short, advances in representation learning make it possible for a trained neural network to share its expertise with humans as well as with other deep learning systems.

Sharing expertise is the secret of efficient systems

Machines that can abstract meaningful components and share this high-level expertise with other systems is quite a novel idea. Yet, humans and corporations have been sharing expertise for a very long time.

Take the example of Silicon Valley, arguably the most fruitful technological ecosystem in the world. Its success is largely due to the fact that engineers and researchers bounce around from one great company to another. Each time, these talented individuals get exposed to the inner workings of a new company and, as a result, improve their skills and knowledge in the process.

NDAs may prevent an ex-employee from sharing data they’ve been exposed to, but the expertise they gained stays with them. As a result, everyone benefits, and no single big “expertise aggregator” comes out on top.

An ecosystem of AI systems

In the same way that successful corporations seek out ecosystems in which they can find talented employees, they should strive to build their AI in an environment that gets them exposed to as many different tasks as possible.

Representation learning opens the door to AI systems that can improve their expertise over time as they get exposed to different tasks. This, in turn, leads to systems that can perform significantly better on a totally new task, even when this task only comes with a small and specialized dataset.

In time, advances in representation learning will lead to greater exchanges of expertise between AI systems and the old adage will change to:

The greatest killer of AI is siloed expertise.

At this point we will have truly achieved the dream of democratized AI, allowing every corporation — large or small, big tech or not — to benefit from the power of machine learning.

[1] This is a bit of an oversimplification. With some work, it is possible to find a few perceptrons in lower levels that detect components like “faces” or “wrinkles” or “text in images”. It remains hard to explain what most perceptrons do, though, and why the network still needs them to emit its final answer.

Illustrations by Morgan Guegan

--

--

Philippe Beaudoin
Element AI

SVP Research at Element AI, Ex Google engineer, now trying to beat the singularity to the finish line.