‘food: other’ according to Flickr.

Artificial Intelligence? Perhaps it’s not so intelligent after all

Martin Colebourne
tobiasandtobias
Published in
5 min readNov 13, 2017

--

According to the photo site Flickr, the picture at the top of this page belongs in the category ‘Food: Other’. This error can tell us a lot about the nature of Artificial Intelligence today and the reality behind the hype. But before we can dive in, we need to set the scene a little.

There is a lot of discussion in the media at the moment about Artificial Intelligence (AI). A common thread is the idea that AI is approaching human-like levels of intelligence, or that we will reach that point in just a few years.

This, I believe, is a mistake: We are much further from matching human levels of intelligence than the hype suggests. The focus on ‘human-like’ intelligence tends to underestimate what humans can do; whilst underplaying the unique, but distinct, strengths of AI.

In many ways AI is more powerful than human intelligence: it can react more quickly; it is able to deal with vastly larger quantities of data; and it can work non-stop without rest and without loss of attention. However, in other important ways, AI remains very basic. The current state of research falls far short of the kind of ‘intelligent’ behaviour shown by any average human being.

Picture Classification

One of the most challenging problems in AI is that of vision — deciphering images, whether static, or dynamic is both very hard and very important. Solving it lies at the heart of many applications of AI, from facial recognition for the security services to self-driving cars.

Flickr uses a form of “advanced image-recognition technology” to classify all of your photos into categories on a page called ‘Camera Roll’. Whilst this is probably not at the cutting edge of vision research, it is a useful example because it is accessible and easy to relate to, and it can tell us a lot about the power and limitations of AI.

‘Camera roll’ on Flickr

When you first discover the feature it is deeply impressive. It works on a massive scale — working through thousands of photos in a few seconds. It also achieves something that we have tended to associate with intelligence — ‘looking at’ images and classifying them. How on earth do they recognise that this image shows a bird, whilst that one is a cloud?

Amongst the thousands of images on my Photostream, there are a few odd photos which have been misclassified, but the overall success rate is very high. Let us suppose that the algorithm is correct 99% of the time compared to the classifications given by a group of people. It would be natural at this point to ask how much more power would be needed for it to be perfect and ‘match’ what humans can do.

Given that the performance is so close to 100%, it would be easy to assume that only a small improvement will be required to perfect the system: perhaps a 10% increase in power would be enough. In practice, improving the system is likely to become progressively more difficult the closer we get to 100%, but even allowing for this, the gap seems small — perhaps we need to double the power.

Characteristic Failures

A more detailed look at those few errors, however, reveals a very different story. The errors are interesting because they have a peculiar quality to them. Rather than being just slightly off, some of the misclassifications are wildly, extravagantly, wrong.

For example, the category ‘Food: other’, contains that picture of my infant son playing in a ball pit.

‘Landscape: snow’ includes a picture of some fluffy slippers.

‘People: group shot’ includes a pair of Meerkats.

And ‘Vehicle: train’ includes a photograph of graffiti on an electricity substation.

It would be easy to complain that I am being unfair — this is technology in development, after all. But my point here is not that there are a few errors amongst the vast majority of accurate classifications. Rather that these errors are ones that no human, even a young child, would make. What they reveal is that the algorithm doing the sorting is not interpreting the images, as we do, but simply looking for correlations of shapes, patterns, colours and textures. There is no understanding happening at all.

How much more powerful would the system need to be to understand the content of an image and make decisions based on that, rather than simply the visual patterns in the images? Perhaps ten times more powerful would be enough, but I suspect that the answer is much higher — perhaps thousands of times.

Losing Perspective

The problem of correctly classifying images on a website may not seem very important, but there is a wider point at issue. Human intelligence is not equivalent to being able to correctly classify images; win a game of Go; or pilot a car down a highway. Performance in these kinds of tasks drives the current hype around AI, but this focus underestimates the gap between technology and human performance.

By focusing on specific, measurable tests, we are in danger of redefining what humans can do in terms of the specific test that we have employed. For example, the problem of vision can come to be seen as only the problem of classifying images.

At the same time, we become overconfident in the power of AI, claiming that it is better and more reliable that human intelligence. When an algorithm becomes highly accurate on a given test, this story becomes easy to accept and we are encouraged to rely on the algorithm — assuming that it will always perform correctly. We become complacent and less well-prepared for the rare, but spectacular, errors that will still occur. Consider the impact if the same kind of wild errors are possible with the image recognition software employed by a self driving car.

AI has huge potential as a tool to make our lives better, but it is not close to matching human intelligence. Instead it offers something different. That’s fine, AI should be there to help us, we don’t need a replacement.

--

--

Martin Colebourne
tobiasandtobias

Martin writes thought-provoking essays on science, philosophy, politics and design that nobody reads.