Artificial Intelligence (AI) is a fast growing field. The 2018 AI Index report illustrates just how fast it is growing. It reports that published research papers in AI have increased 7x since 1996, university enrolment on AI courses has increased 5x since 2012, investment in AI startups in the US has increased 113% since 2012 and mentions of AI and machine learning (ML) in the earnings calls of tech companies have increased more than 100x since 2012. These statistics show how AI is growing not just in academia, but the technology is rapidly being adopted by businesses and becoming commercialised.
Yet, AI is an ambiguous term which has no well defined and agreed upon definition. It’s typically used as an umbrella term covering a variety of techniques that make computers appear to have human-like intelligence. Recent advances in AI have largely been driven by machine learning — a set of algorithms which learn their behaviour from data.
The Beginning of AI
The term ‘artificial intelligence’ was chosen by John McCarthy for a workshop to be held in the summer of 1955 at Dartmouth College. The workshop brought together several leaders in computing and had ambitious goals:
“An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer” — Dartmouth workshop proposal
Since this 1955 workshop, there have been several periods of excitement around AI. As a result, when the technology didn’t live up to the hype, these were followed by a drop in interest. There were two notable periods of lower investment in the technology (1974–1980 and 1987–1993) nicknamed ‘ AI winters ‘. While the Dartmouth College workshop didn’t solve the problems it set out to, it set the direction for AI research in the next decades.
By the 1980s, expert systems were the most popular approach to AI. Expert systems encode human knowledge about a domain as rules, and allow inference over that knowledge. For example, suppose we have rules stating that the Shanghai tower is taller than the Empire State building and that the Empire State building is taller than the Eiffel Tower. From these, the expert system can infer that the Shanghai tower is taller than the Eiffel Tower using logical rules about height.
Expert systems ultimately became hard to maintain, brittle and difficult to work with. However, knowledge bases, like Google’s Knowledge Graph and Amazon’s Alexa knowledge base, still form part of modern AI systems. They encode knowledge about the world that these virtual assistants rely on to answer questions.
Machine learning (ML) refers to a set of algorithms which learn their behaviour from data. It’s been around for decades, but became more popular from the 1990s onwards. A lot of the current discussion about AI concerns technology which is based on supervised machine learning methods, in particular deep learning and neural networks. These have been very successful in the past 5–10 years.
Consider automatic speech recognition (ASR) — the task of automatically transcribing human speech. It’s simply not possible to write down the rules, or knowledge, to explicitly program a computer to do this task. Human speech is highly variable, and even the same person cannot repeatedly speak exactly the same utterance in exactly the same way.
For these reasons, machine learning is used as the basis of ASR systems which learn how to transcribe speech from examples of manually transcribed audio. Modern systems might use thousands of hours of audio and tens of millions of words of text to learn from.
There are three broad sub-categories of machine learning.
Supervised learning is where a computer learns how to do a task from labelled examples:
- Speech recognition systems are trained from labelled audio data
- Object recognition systems are trained from labelled photos — ImageNet is a dataset of labelled images containing more than 14 million examples across more than 20,000 categories.
- Spam email classification uses examples of spam email to learn from
Supervised learning can include both classification (categorising into a set of distinct classes such as a set of image labels) or regression (predicting a continuous value such as a financial stock price).
Unsupervised learning learns patterns, groups and categories in unlabelled data:
- Clustering a set of scientific papers to uncover the topics they are written about (topic modelling with LDA)
- Clustering segments of audio to discover how many speakers are represented in the set
Reinforcement learning is used in scenarios where the machine has to take a number of steps in an uncertain environment, towards a goal, before it can know whether those actions were good:
- Having a multi-turn dialogue with a person to complete a task like booking a restaurant table
- A robot traveling to a particular location when it can’t sense its environment perfectly
Machine learning has been hugely successful in recent years due to a) increased data, b) increased computation, and c) improved algorithms. With cheaper storage and connectivity, data can be more easily collected and shared. Increases in computing power and the introduction of cloud computing mean we can quickly train on larger and larger amounts of data. Cloud computing and smartphones mean we can offload intensive computation to the cloud while people access AI through their phones. Neural networks/ deep learning algorithms have their roots as far back as the 1940s, but with increased data and computational power have become very powerful machine learning techniques.
Though recent performance improvements have been impressive, no machine learning algorithm is perfect. We always expect some error in the output. A key measure of system performance is its error rate.
A word of caution
Despite recent advances, AI and ML technology still need care and awareness so they are built in ways which do not cause harm and discriminate, particularly when being deployed at scale.
Not all tasks are able to be learnt by machine learning. Some tasks are impossible, like identifying criminality or sexuality from your face. A good rule of thumb is that a person has to be able to do a task in a couple of seconds with ease, otherwise you cannot train a machine to do that task.
When it is possible to learn from data, machine learning uses a finite dataset. If a system is built on a training dataset and then deployed in a scenario that’s different from the training data, it doesn’t perform as well. For example, an object recognition system will not be able to recognise images of fish if it’s only ever been trained on images of plants. This becomes much more complex if, say, your object recognition system is trained on images from the US and used in the real world on images from Asia. Now the technology which works well to identify objects in the West doesn’t work well in other regions of the world. Along similar lines, facial recognition can perform poorly for certain segments of the population if it’s not trained on representative data.
Another issue with learning from data arises when ML is seeking to automate human decisions which are already biased. In this situation, human bias is embedded in the training data set. Identifying potential reoffenders using data in an unbiased way is tough because any dataset collected to train the system already contains real-world bias. Any AI system would replicate that bias.
Artificial Intelligence is behind many of the products we use today — virtual assistants, spam email filtering, fraud detection, online ad placement and much more. It is being used daily by large numbers of people around the world.
The success of AI has been driven by other technological improvements such as data, connectivity, storage, cloud computing and smartphones, as well as improvements in the core technology. On many tasks, deep learning with neural networks has drastically improved performance in recent years.
Yet, AI is not a silver bullet. It has behaviour and limitations which must be understood by both those building and those using the technology.