What is Machine Learning?

A Non-Technical Approach

Published in

birdie.ai

7 min readDec 16, 2020

As a Data Scientist, I’ve faced the challenge of trying to explain AI solutions to non-technical people. Most of the times I’ve tried that, both I and them felt like it would be beneficial for us if we could dive a little bit deeper into the technicalities of Machine Learning so that we can design a better solution for our products.

Machine Learning solutions are often seen as “black boxes” that take inputs and spit outputs. But there’s an issue with this abstraction: AI solutions can be tuned to work best when you have a domain-specialist persona designing them. This specialist knows the domain-specific, deeply related information to the final product, which variables impact each other, what matters and what doesn’t, which metric better evaluates the results, what clients care about, what they don’t, and so on.

With that in mind, I’ll do my best to open this “black box” of machine learning for that non-technical persona.

Figure 1: Machine Learning Black Box View

Overview

A Machine Learning model is basically a mathematical function. It takes inputs and maps them to outputs, just like f(x) = y. You can think of x as the input data, f as the machine learning model, and y as the output data. In practical terms, we are often talking of mapping hundreds, thousands, or even millions of parameters instead of just one like in f(x) = y.

Another interesting point of view is to compare the Traditional Programming Paradigm with the Machine Learning Paradigm. In the first one, you must have the input data and the programmatic solution in order to get the output data, while in the second one, you only need a set of input and output data, and then you let the AI find the mapping between the input and the output for you.

Figure 2: Traditional Programming vs. Machine Learning

Supervised Learning: Regression

Imagine you’re dealing with a somewhat complex problem, that you can’t map to a programmatic solution, like finding the price of a house.

Can you imagine the number of variables that can influence the price of a house? Is this a problem that you can solve algorithmically, always finding the correct price, or at least getting very close to it? What does the data for this problem look like?

The task of mapping this set of variables to a real value is a Supervised Machine Learning task called Regression. We call a task Supervised when we need to know the output data beforehand in order to “teach” the AI. In this case, we need to know the price of the houses beforehand.

Figure 3: Hypothetical Dataset for House Pricing

What Learning Actually Means?

In a Regression task, we are trying to predict a real number. Let’s say we’ll use only one variable to solve this problem, the area of the house in square feet, and use it to predict the price of the house in dollars. A very simple algorithm called Linear Regression will try to fit a line that minimizes the distances between itself and all the data points.

As you can recall from your math classes, a line is described by the function y = mx + b. In this case, y is the label (price), x is the attribute (size in square feet), and both m and b are parameters that the model will need to figure out by itself. m describes the angle of the line, while b moves the line up and down on the y axis.

The question we should be asking ourselves is: what are the values for m and b that accomplish the lowest error? In this case, the error is the difference between the actual price and the price predicted by the model. If we calculate that, then we’ll be able to measure how well the model is doing and adjust it accordingly. This is what we call learning: adjusting the model’s parameters so that we can minimize the error.

Now that you’ve obtained the parameters m and b through the training phase, you can predict a new unseen example just by doing y = mx + b. This phase where you predict an unseen example is called the inference phase.

Figure 4: Learning Process of a Linear Regression Model

One thing to keep in mind is that in real-world scenarios, we are often dealing with complex non-linear data, which means a simple line might not be a good representation for our models. In this case, we would need more complex representations, that can fit non-linear shapes, like a curve.

Figure 5: Linear Model vs Polynomial Model

Recall that in this example, we are only using one attribute (the area in square feet) to predict the price of the house, but is that enough? Could you accurately predict the price of a house if you only knew this information?

Machine Learning models are capable of working with n-dimensional data, meaning we are not limited to only one or two attributes. We can use as many as we want, but that also adds another layer of complexity to the model.

If we would use two attributes, a line would no longer be enough to represent the data, we would then be fitting a plane.

Figure 6: Linear Regression Model with 2 Attributes

Deep Learning

Deep Learning takes the complexity and computational cost of an AI solution to the next level. In this article, I presented the Linear Regression model as a function f(X) = y, that maps the attributes X = [x0, x1, …, xn] to a real value y. A Neural Network can be seen as a combination of functions, where each neuron will try to learn its own function, that will be combined with other functions from other neurons as the network gets deeper and deeper, until it gets to the last layer, the output layer, where all the functions will be combined into a single one that finally delivers the output.

Figure 7: Basic Architecture of a Neural Network

With this approach, we don’t need to worry too much about feature selection, in other words, we don’t need to know whether we should use the attribute “# of bedrooms” or not, or if we should combine the attributes “# of bedrooms” and “# of bathrooms” into something like “# of rooms”. All of these possibilities are already encoded into the neural network architecture, and it’s able to automatically identify which attributes should be combined to achieve a better result, or which attributes should not be considered at all.

There’s no limit to how big a neural network can get, both vertically (increasing the number of neurons per layer) and horizontally (increasing the number of hidden layers). The bigger the architecture, the more complex and powerful the final model will be, at the cost of getting extremely expensive computationally. For instance, a gigantic state-of-the-art neural network called XLNet took 2.5 days and cost about $245,000 to train.

If you’re interested in understanding the inner workings of deep learning, I’d recommend this playlist.

Conclusion

In this article, you had a very brief overview of Machine Learning. Of course, there are several other topics we could discuss here, but let’s keep it simple. I’d like you to finish this reading with 5 takeaways:

1. Machine Learning works by automatically finding parameters that minimize an error function. It confronts the true label provided by you with its own prediction and adjusts Its parameters accordingly.

2. Supervised Learning is the set of solutions that require you to provide the AI with a set of output samples so that you can “teach” it how to map inputs to outputs.

3. The Training Phase is when you provide the AI with examples so that it can find the parameters, while the Inference Phase is when you already have the trained parameters and want to predict unseen data.

4. Deep Learning is a sub-field of Machine Learning that deals with very complex models. It can automatically find combinations of attributes, increasing its performance. It works very well with unstructured data.

5. Machine Learning should not be your go-to solution for every problem. These solutions are usually very data-demanding, expensive computationally, prone to errors, and often impossible to interpret. If there is an optimal programmatic solution to your problem, use it.

Here at Birdie, we use AI on a daily basis. Many types of insights can be gained from the outputs of our AI-powered models, and with those, we aim to bridge the gap between consumer feedback and brands. We are aiming for a transparent culture, sharing content pieces about our daily struggles and lessons: stay tuned for more stories like this one!