A machine learning primer

My colleague, @HaD_XIII, instigated one-hour ad hoc tutorial sessions so we could learn from the diverse experiences and skills here at the Barclays/Techstars accelerator in London. Now it’s my turn to do an introduction to machine learning, and I’ve turned it in to this blog post. Hopefully the other guys will do blog posts too (hint, hint).

What is Machine Learning and why might you want to use it?

One widely quoted definition of machine learning:

“Field of study that gives computers the ability to learn without being explicitly programmed”.
Arthur Samuel, 1959
  • E.g. explicit programme: conditional logic, ‘if this then that’, a calculator.
  • E.g. machine learning: use data to learn models that can be used to recognise faces, predict stock market prices or make personalised product recommendations.

Both use algorithms, but in machine learning, a learning algorithm is used to describe how the computer should search to find the best answer (given the data), whereas explicit programmes instruct the computer exactly what to do to get the answer.

Why?

Well, sometimes it’s not feasible to design an explicit algorithm to find the ‘answer’. That might be because no ‘exact’ answer exist or because it would be too computationally intensive to find it. Machine learning enables us to find good approximations and to extrapolate and infer to situations that the computer hasn’t seen before.

Categories of machine learning

Machine learning is commonly grouped in to 3 main types:

Supervised Learning: algorithms that ingest labelled data training data which is used to learn a general rule for predicting the labels of unseen data:

  • Classification: Is it a cat or a dog? Is it a tumour or not?
  • Regression (used with continuous labels). What is this person’s credit worthiness?
(Link to ImageNet database / classification image from Krizevsky et al 2012)

Unsupervised Learning: algorithms that ingest unlabelled data, and discover organising principles/structure in it. Used for

  • Clustering: group data objects in to clusters according to their similarity to one another
  • Dimension reduction: learn which are the important components of the data (without losing too much information)

In a sense, unsupervised learning is about summarising data, either by choosing representative groupings or distinguishing features. E.g. Google’s PageRank algorithm or hierarchical clustering of genes to show those most associated with cancer.

(Links to Iris data set / MNIST data; PCA illustration from David Barber/BRML)

Reinforcement Learning: algorithms that aim to maximise a reward/goal (such as a high Atari game score or the ability to drive a car safely) by evaluating the result of its actions (even when the reward may only come after numerous actions).

E.g. (definitely worth clicking on if you haven’t seen these before)

Why hot right now?

The world is complex. It turns out that it’s really hard to create expert systems to do difficult tasks, i.e. to create detailed descriptions of the world that machines can follow in order to act intelligently. Capturing all relevant possibilities is very, very complex / impossible. Such systems were expensive / slow / didn’t work!

(e.g. would you trust a hard-coded self driving car?)

Instead, we want to embrace uncertainty and improve performance by learning from data/experience.

But, learning has only recently become possible / efficient in lots of domains because of the availability of data (you need a lot of data to make good predictions), efficient algorithms, and the cost-effective hardware needed to run them. In fact, according to some commentators (cf Azeem Azhar), there are actually six primary drivers of machine learning:

  • Moore’s law (cheap hardware)
  • The Data Explosion
  • Internet Collaboration (information and expertise are more accessible)
  • ‘Software eating the world’ (all business problems have started to look like software problems)
  • API’s and microservice architecture
  • The law of AI Lock in (those using AI will win)

What is a neural network?

A neural network is just one type of model used in machine learning. Other examples are decision trees, Bayesian networks, and support vector machines. We’re focusing on neural nets because they’ve been in the news quite a bit lately!

A cartoon drawing of a biological neuron (left) and its mathematical model (right). Sources here and here
  • Artificial neurons are inspired by neuroscience. An individual artificial neuron has multiple inputs and one output. The neuron ‘fires’ if its output is above a certain threshold.
  • An artificial neural network simulates networks of neurons in the brain, by linking lots of neurons together. The resulting structures can learn very complex hypotheses.
  • Convolutional neural networks exploit the 2D structure of images.
  • Recurrent neural networks deal with sequential features like speech or time series data

But what does that all mean?

“I like to think of neural nets in terms of the inputs strengthening pathways in the network so that new examples tend to follow these pathways”.
Colleen (@OhmnomData)

This amazing neural net visualisation tool shows how the weights on the paths between the layers of neurons change over time when the neural net is learning from training data, and what each neuron is learning.

Deep Learning

Deep learning is often used to refer specifically to very deep (many-layered) neural nets or several neural nets chained together. More generally, it means the ability to learn very abstract representations:

‘a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers, with complex structures or otherwise, composed of multiple non-linear transformations.’ (Wikipedia).

Here are some abstract representations:

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Deep learning has been fantastically successful in recent years, and is responsible for better-than-human performance in image classification, face recognition and playing Go. Not everyone thinks that deep learning is the bee’s knees — because the conclusions it reaches can’t be explained easily (they’re not ‘interpretable’), and it tends to require a LOT of data and compute power.

Combinations of deep and other learning methods may be far more powerful than one alone.

How does machine learning relate to Artificial Intelligence (and Artificial General Intelligence)?

AI refers to systems that can act intelligently, even in a very narrow scope. Artificial General Intelligence (AGI) is a term coined for AIs that can do multiple tasks, maybe even approach (or exceed?) human-level flexibility (and hence are differentiated from ‘narrow’, task-based AI).

AI/AGI systems may incorporate robotics, knowledge representation, machine learning, vision, and natural language processing. That is, machine learning is one component of AI.

Having said that, you do sometimes see the terms AI and machine learning used interchangeably since so many of the components of AI now rely on machine learning. Many modern vision, language, and complex decision systems would be unthinkable without it. If in doubt, use ‘machine intelligence”!

(Deep Blue (IBM’s chess-playing machine) was an AI that was rules-based, it didn’t use machine learning. A current really interesting example is, ‘Cyc’, an AI that is the result of ‘the culmination of 31 years of hard-coding rules and logic…’ )

What else?

Barriers to adoption / entry

  • Machine learning algorithms typically need lots of data to learn useful things. Most people / companies don’t have ready access to such data, creating barriers to entry for those that do — and fears over their power.
  • For current machine learning techniques, even if you have the data, you still need expertise to learn good models (what e.g. Seldon facilitates)
  • For some application areas (e.g. healthcare) ‘interpretability’ is important for adoption — the ‘why’ of predictions. So deep learning methods are sometimes deemed inappropriate. ‘Interpretability’ is a fertile area of research [3]

Progress is being made in many other areas such as ‘one shot’ learning (learning from one or few examples, like humans do [4]), unsupervised or semi-supervised methods (avoiding the expense / difficulty of obtaining labelled data), multi-modal learning (synthesising information from e.g. text, audio, images, sensory data); attention learning, planning, and memory (being able to focus on what is important, work out context and ‘remember’ relevant information)… these will enable more efficient learning and more complex inference tasks. Progress towards AGI!

At the same time, progress is still required in hardware, both to speed up training time on the one hand, and to embed intelligence in mobile phones, cars, and other devices on the other.

Where to go for further information

Online courses:

High-ish level overviews:

  • Andrew Ng (Stanford, Baidu): Machine Learning (absolutely the best place to start. It appears to be nice and introductory, yet bears repeated viewing)
  • Yaser Abu Mostafa (Caltech): Learning from Data (tells you when and why machine learning works)

More specific:

Blogs / newsletters / pop science:

Books:

Papers:

The best papers are very well written AND they’re usually free to access. Look for papers on arXiv or use arXiv Sanity Preserver as a handy filter.

Finally, for fun…

If you got this far, reward yourself by using neural nets to create your own art with Deep Art. “Our algorithm is inspired by the human brain. It uses the stylistic elements of one image to draw the content of another. Get your own artwork in just three steps”. (Related paper: http://arxiv.org/abs/1508.06576)

(Thank you to Colleen Smith (@OhmnomData) of Zighra for your input)