What can AI Actually Do?

What’s the difference between a forest fire and a hurricane? What’s the difference between bricklaying and architecture?

Predictions of how quickly AI technologies will advance are as plentiful as they are diverse. Often the most extreme predictions garner the most attention, but there are also many experts who are considerably more conservative in their forecasts. Getting a handle on the direction of travel of these technologies would seem to be imperative for the making of both good business decisions and good public policy. How did we get to a position of such uncertainty? How do we figure out what AI can actually do? I’m going to try and sketch some answers to these questions, without resorting to any math, so please bear with me!

Before turning to the second question I’m going to take a stab at the first. Most public discussion of AI actually refers to a much narrower field: Deep learning or deep neural networks. Neural networks have been studied since the 1950s and for most of their history were considered pretty useless. It’s probably fair to say that interest in them resumed significantly in 2012 in large part thanks to a paper by Krizhevsky, Sutskever and Hinton on image recognition (Hinton now moonlights at Google). Two important things happened here:

  1. A neural network containing 650,000 neurons completely smashed all previous performance records at this kind of task.
  2. Researchers really had very little idea as to why.

Since 2012 huge progress has been made, with neural networks being applied very successfully to a vast and growing set of problems that have stumped AI researchers for decades. Suddenly engineers and researchers were able to solve, with relative ease, problems that a few years earlier were considered unassailable. I think this is source of much of the optimism about the progress of AI in the next few years.

Researchers remained largely ignorant though, as to why these techniques were so effective. There were no known limits on what could be accomplished using this technology. It seemed that all you needed was a big enough neural network and you would have a program capable of learning to perform any task (strong AI). You will notice that I’m using the past tense here, because this question may just have been answered.

In August two physicists from MIT and Harvard released a paper that goes a long way to explaining why neural networks perform so well at certain tasks. This work hasn’t been peer reviewed yet but hopefully, given its high profile and accessible reasoning, any problems will be uncovered quickly. The main conclusions are that:

  1. Neural networks are particularly well suited to solving problems related to the material world.
  2. In particular, neural networks are good for modelling systems that consist of layers of simple relationships of the kind that are very common in physics.

Say we want an algorithm that can identify cats in photos. In this case a neural network would be a good choice because:

  • Cats are Gaussian, they’re pretty consistent in their size and shape. Their physiology is constrained by the laws of physics. You do simply do not see (and it is not possible for there to be) cats two hundred feet tall.
  • Cats are symmetrical, the left hand side of a cat looks like the right hand side. So the neural network only actually needs to learn what one side of a cat looks like because the other looks exactly the same.
  • Cats are local. One part of a cat is going to be near other parts of a cat (hopefully).
  • Cats are reducible. All of the colours and shapes that a cat consists of can be described by simpler rules than a whole cat and can be used to describe several other animals as well. Cats have tails, but so do dogs and lemurs, etc.

These properties of cats make it much easier for a neural network to learn how to recognise them in photos and the same is true for the vast majority of things we usually take photos of. We rarely think about how unusual they are. They’re unusual because of all of the possible combinations of pixels that a photo could consist of, most wouldn’t have these kind of patterns in them. Most possible photos contain no patterns at all, they’re just noise, mess, chaos.

Even of the photos that do contain patterns that can be learned, most don’t have the same properties. Consider another common example of photos that are intensely analysed all of the time, stock prices. Let’s say you want an algorithm that’s going to tell you when the chart of a stock price is from a day when there was some exciting news about the company. Immediately you run into a few problems:

  • Stock prices aren’t Gaussian. The highest possible value of a company is all of the money in the world. We could all of us decide tomorrow to pore our life savings into a particular company’s stock and the price would shoot through the roof. There are no laws governing the maximum stock price in the same way there are laws governing the maximum size of a cat.
  • Stock prices aren’t symmetrical. A price might rise steadily and then fall quickly, or vice versa. More importantly, there are some limits on how far a companies price can fall (e.g. the value of the assets of that company) but as we saw above, there are basically no limits on how high it can rise.
  • Stock prices aren’t local. Good news can be followed immediately by bad or by no news at all. Some news has effects that last for a long time and other news is forgotten about almost instantly.
  • Stock prices aren’t reducible. They reflect information drawn from many different sources but the way in which this information can’t be reduced to a series of simple relationships. There is no simple relationship between the stock price and the company’s annual profit that holds for all companies for all times in the same way that the rules governing the positioning of a tail holds for all mammals.

All of these problems make it difficult to train a neural network to perform this task. When human beings engage in this kind of exercise they combine different sources of knowledge in ways that are difficult to codify simply.

We’re now ready to attempt an answer to the question that titles this essay. We should expect neural networks to be extraordinarily useful in the natural sciences and those disciplines related to their application (e.g. medicine and mechanical engineering). Even within these disciplines though, there are problems that don’t have these kind of properties. Hurricanes, for example, are famously (and tragically) unpredictable, their future paths depending not just on where the weather system is now, but its entire history.

The upshot of the this is that neural network powered systems capable of predicting and manipulating the most aspects of physical world apparently face no significant barriers. Even very complicated physical tasks can probably be learned using neural networks small enough to be practical in terms of the computing power they need. A robot that can do the ironing is probably just around the corner!

Moving onto social or intellectual pursuits the picture becomes more complicated. Predicting the monetary value of a building or the success of a military strategy is a problem in which the properties listed above clearly don’t apply. Suggesting that AI would struggle with such tasks isn’t new, but I think we can now begin to understand what makes these cases different.

Learning to physically assemble the Sydney Opera House is a question of learning to adequately manipulate the materials the building is constructed of. It’s a task that can be reduced to a (large) number of simple steps, each of which themselves can be reduced to relatively simple rules governing the behaviour of the materials involved and so on. Predicting that Utzon’s design for the Sydney Opera House would be the winning bid is a qualitatively different kind of task. It requires understanding the history, politics and finance surrounding the bid process and is not reducible to a combination of simpler steps in the same way.

Understanding that the effectiveness of neural networks derives from their ability to model the kind of systems common in physics helps us to predict the kind of tasks they will be able to learn successfully. We should expect neural networks to be able to take over a huge variety of tasks where the ability to predict the physical environment is key. Ironing, driving and construction are all fair game. Tasks such as finance, architecture and publishing are likely to be much harder to learn. I don’t think we should expect the robots to be coming for the architects anytime soon.