The technical details of machine learning can be dizzyingly complex. But at a higher level — the kind of level that you need to grasp in order to be able to understand what machine learning is and does — it is comprehensible without any math or coding background.
The problem with talking about technical things nontechnically, though, is the rampant use and misuse of buzzwords. To combat that, let’s define some of the most widely used concepts related to the field of machine learning.
To do that, let’s first look at the information pyramid (a slightly modified version of the DIKW pyramid):
At the very bottom of the pyramid is data, which — among many other forms — could live in a table, a collection of images, or be spread across a bunch of disorganised files.
Data by itself is not useful. The goal of using data for any process is to make better decisions, and as we move up the pyramid we go from raw data which is noisy and hard to interpret to an understanding, and finally to something actionable — a decision.
Let’s now try to define a bunch of buzzwords — from big data to deep learning — in terms of this pyramid.
For most people, “big data” means “more data than I’m used to”, at the bottom of the pyramid. If you’re used to making decisions based on hundreds of data points, any amount of data that doesn’t fit into an Excel sheet may feel big to you. If you’re operating at Google scale, big data might start from volumes that don’t fit into a single data center.
A good rule of thumb is that data becomes big data when it doesn’t fit into a single machine’s memory (RAM) anymore, which today means roughly 1TB, or 1000GB. (The largest machines on Amazon Web Services have about 4TB of RAM.)
Data science is a broad term that includes any activity that helps us move upwards in the pyramid. This might include
- gruntwork like cleaning and pre-processing data,
- putting together reports and doing one-off analyses,
- building visualisations and dashboards for decisionmakers,
- building models that make decisions automatically. (This is where machine learning comes in, but more on that later.)
This is a very broad definition, mostly because “data science” is a relatively new term and thus has not been settled precisely.
One data science method is machine learning, whose goal is to go from input data to decisions automatically. This usually means “training a model”, i.e. learning a computer program from examples, as opposed to explicitly defining every step that should be executed (the classical programming approach).
Another comparison with classical programming is through test cases. Typically, the quality and completeness of software are evaluated through tests, and for each task, there are around 10–100 different test cases — examples of input with the desired output.
The machine learning approach uses 1000 or more “test cases” (in fact the tests are correctly solved examples from the real world), but instead of just testing the quality of the program on these examples, the program is learned from them.
Deep learning is a subset of machine learning. In the past five years, it has seen a lot of progress and has been applied to many problems, more successfully than anyone foresaw.
Deep learning is not a magic wand, though. First, it achieves very good performance only on some types of data: images, videos, audio, and text. This contrasts with most business problems, which usually include fraud detection and customer segmentation, and don’t require detecting cats, chairs, and bananas from images.
Deep learning also requires huge amounts of data to do well on a task — usually millions of data points. There are ways to overcome this for some tasks (the solution is to piggyback on other people’s work and datasets), but not always. The graph below, originally from Andrew Ng, shows that the benefit of deep neural networks becomes apparent only if your dataset is large enough, and on small datasets, classical approaches usually work better.
Another problem with deep learning is that it is surprisingly difficult to implement. It is pretty easy to get to a first prototype using classical machine learning, but deep learning methods today are often so finicky that just reproducing something other people have published can take months of work.
AI is a very vague term. It can also describe a research field, but often it is used to describe a product or system which is considered to “have AI” or “be intelligent”.
“AI” does not refer to any specific technical approach: instead, it means the system feels intelligent to the user. If you can write a normal piece of software that feels intelligent — like the chatbot built in the ’60s — people will be happy to call it an artificial intelligence.
The vagueness of the term is also a good diagnostic test: if someone uses the term “AI” to explain something to you, they either think you don’t understand anything about the field, or they themselves don’t.
This was a very brief overview of basic concepts in Machine Learning. You now know enough to make sense of the kind of discussions we’ll be having on this site — take a look at our other articles here and learn about the opportunities that machine learning offers your firm.
Authors: Taivo Pungas, Joonatan Samuel, Karin Kruup, Gary Monro.
Brought to you by DataMob.