Intro to Machine Learning for the Everyday Person

Rohit Baney
Analytics Vidhya
Published in
8 min readJul 10, 2021

--

Image by Markus Winkler on Unsplash

Technological advancements in the last decade have caused a surge in the collection and implementation of Data. Not a day goes by that we don’t hear buzzwords like ‘Data Science’, ‘Artificial Intelligence’ or ‘Machine Learning’. We hear about companies leveraging the power of data to drive business decisions using a mystical process known as Artificial Intelligence or Machine Learning, but what do these terms really mean?

Most articles I come across about the subject are either filled with so much technical jargon that they are completely inaccessible to the everyday person or are so simplistic, that they make it seem like sci-fi magic. The purpose of this article is to introduce Machine Learning in a manner that is easily accessible to someone without any technical know-how so as to demystify its mysteries by breaking it down into its principal components.

Before we deep dive into the nebulous realm of Machine Learning, however, it is important for us to differentiate it from another popular buzzword: Artificial Intelligence. Both these words are used interchangeably across media nowadays, but there is a subtle, yet important difference: Machine Learning is a subset of Artificial Intelligence. It is one application of the process through which we teach computers to ‘learn’ how to do things.

Venn Diagram that shows Machine Learning as a subset of Artificial Intelligence
Image by author

To someone who has only heard of Machine Learning in articles and on social media, it might seem like a magical process through which computers are going to learn how to take over the world, but reality is far less dramatic. Don’t get me wrong, Machine Learning is definitely exciting, but in essence, it’s simply a really clever label-maker. Without getting too technical, Machine Learning is basically the process where data is fed into a mathematical model which then uses trends in that data to perform tasks. At its core, Machine Learning consists of 2 important parts: the Data, and the Algorithm (model).

The Data

The data we feed into our models so that it can ‘Machine Learn’ is an important part of the process. How our model performs relies largely on how good our data is. This data is typically found in the form of ‘datasets’, which are rows and columns (think Microsoft excel) of information. Each row typically represents one item/object, while each column represents information about that item/object. Additionally, we might also have an extra column that gives us the answers, called ‘labels’, to the problem we are trying to solve. To understand this better, let us look at a simple example.

Consider a dataset about cars. Each row in the dataset represents a single car, while each column represents information about that car such as color, model, tire size, mileage, top speed, etc. These columns are called ‘features’. Additionally, there might be another column called the ‘label’ that tells us if the car is a sports car or a regular car.

Image by author

When we feed this data into our model, it figures out which features are important and which ones are not relative to the ‘label’ column. In our example, the labels we give our model allows it to look at the data and say, ‘ah, well, in most cases, a car with a high top speed, made by companies like Audi, BMW or Porsche are sports cars’. Then, when you show it a different car without the label attached, it can tell you, based on the car’s features like top-speed or make, whether the car is most likely a sports car or a regular car.

How does our model do this though? How does it ‘figure out’ things? This is the point where most articles introduce technical jargon like ‘gradient descent’ and ‘optimization’ that tends to make the layman give up on trying to understand what’s happening and simply declare the process as modern magic. Allow me, however, to take a stab at explaining this process in the most non-technical manner possible.

The Algorithm

The algorithm is where Machine Learning actually happens. There are myriad algorithms that perform different kinds of tasks, but most of them have the same skeletal structure. Nearly all of them are comprised of two main components: The ‘Loss Function’ and the ‘Optimizer’.

Don’t let these terms scare you. Academics like using fancy terms in order to make themselves seem smarter than they are. The essence of the ‘Optimizer’ and the ‘Loss function’ are easy to understand using the following example. Fair warning, I am going to have to introduce a tiny bit of math, but don’t worry if you don’t understand it right away. The concepts matters a ton more than the calculations do.

Consider a dataset that contains information about houses. Remember that the columns contain information about the ‘features’ of the house like size of living room, location, when it was built, its price etc., while each row represents a different house. Here, the ‘label’ is the price of the house, because that is what we are trying to predict for new houses.

The algorithm we are going to use to solve this problem is called Linear Regression which is a fancy academic way of saying we are going to use a line. If we draw a graph of ‘size of the house’ vs ‘price’, we will get something like this:

Graph that plots Price of a house vs size of a house. Data points are positively correlated.
Image by author

We can see that as the size of a house increases, the price increases as well. Linear regression is simply drawing a line that best represents this trend of increasing prices with increasing size.

Taking a trip down memory lane, we might remember our high school math teachers saying something about the equation of a line. It looked something like this: y = mx + c. I promise, this is the only equation I will torture you with. In the equation, x and y simply represent the x and y coordinates of the line on the graph. ‘m’ on the other hand represents the ‘slope’ of the line, or in other words, how much incline the line has. A small slope will represent a nearly horizontal line while a large slope will represent a nearly vertical line. ‘c’ represents the y-intercept of the line which is nothing more than how high up the line is on the graph. A higher ‘c’ will push the entire line directly upwards, while a lower ‘c’ will do the exact opposite, pushing it downwards.

In order to find the line that ‘best fits’ our data, we need to manipulate both ‘m’ and ‘c’ so that our line is as close to as many data points as possible. This is what our algorithm does for us, and this is what we so mysteriously label ‘Machine Learning’. Congratulations! You now know what Machine Learning is. If you would like to dive a little deeper on how Machine Learning works, continue reading on. Otherwise, feel free to skip to the conclusion.

One Step Deeper

What is the ‘best fit’ line? In short, the line that has the least combined ‘loss’ when taking into account every data point is called the best fit line. ‘Loss’ is just a fancy term for ‘error’. The ‘loss’ can be calculated in many ways such as the ‘root-mean-square error’ or the ‘r2 error’. I won’t dive into the intricacies of each of those types of errors in this article, however, these functions which calculate the loss are simply what we call the ‘loss function’. Put plainly, the ‘loss function’ calculates how much error our current line has compared to all the data points around it by using methods such as calculating the distance between the line and each data point and then combining all these distances.

So, in order to ‘Machine Learn’, that is, find the best values of ‘m’ and ‘c’ we need to get the line with the least ‘loss’ or ‘error’. Our algorithm first starts with a random albeit reasonable value of ‘m’ and ‘c’. Using these values, it calculates the ‘loss’. This is where it gets a little tricky. How does our algorithm know whether to increase or decrease ‘m’ and ‘c’ in order to reduce the loss? Essentially, the algorithm takes the ‘derivative’ of the loss function which tells us weather to increase or decrease ‘m’ and ‘c’ in order to reduce our loss function. This process of taking the derivative and changing ‘m’ and ‘c’ is what is called ‘gradient descent’.

GIF by Adarsh Menon on towardsdatascience

The math behind these processes is something I am going to spare you of. The important part to know is that the algorithm first picks a random value of ‘m’ and ‘c’, then calculates the loss. It then reduces this loss by slightly changing the values of ‘m’ and ‘c’ using ‘gradient descent’. Repeating this process over and over gets us the ‘best fit’ line, that is, the line with the lowest loss possible as seen in the GIF above.

We can use this line to predict prices of houses outside our dataset. Thus, our algorithm has ‘learnt’ how to predict the price of a house using its size. We can utilize more of the features of our dataset like the number of rooms or location for instance to make our algorithm perform better on real-life examples.

Conclusion

The above example we discussed with housing prices is just one application of Machine Learning known as ‘regression’. The other example we used with the cars was an application known as ‘classification’. Both regression and classification fall under the branch of Machine Learning known as ‘Supervised Learning. There are two other methods of Machine Learning, ‘Unsupervised Learning’ and ‘Reinforcement Learning’, but they are beyond the scope of this article.

There are various other applications of Machine Learning that use an ever-increasing number of algorithms. Most of them, however, operate on the same principles of reducing a ‘loss function’ using ‘gradient descent’ to achieve various tasks. If you are interested to know more about them, I encourage you to look them up.

I hope this article managed to lift the shroud on the realities of what Machine Learning is. If you have any comments, suggestions or questions, please don’t hesitate to reach out to me on my LinkedIn here. Thank you for reading.

--

--