Explain It Like I’m 5 years old — Machine Learning

Bob Deprizio
Analytics Vidhya
Published in
8 min readJan 26, 2020
https://www.disruptivestatic.com/wp-content/uploads/2018/05/machine-learning-ecommerce-blog-1.jpg

Machine learning. It’s a buzzword most people have heard of but for good reason. Chances are you’ve run across it at some point in your life. Do you like the recommendations that Netflix provides on your account? How about “speaking” to Amazon’s Alexa? Or maybe you’ve noticed that Facebook’s photo tag suggestions are strangely accurate when you upload a photo to the site. This isn’t magic of course but it may appear that way thanks to machine learning.

What Is It Exactly?

Simply put, machine learning is the technology that allows a computer to learn. The idea is that by giving a computer enough data and instructing it to process it in a specific way, it can be “trained” to give increasingly accurate predictions on whatever you may be trying to predict. One of the ideals of this type of “training” is that there’s no need to write a large explicit list of instructions for the computer to make its predictions. With enough data, hopefully, it can make better guesses as the volume and quality of data we give it increases.

As a brief example, let’s say you have 1,000 different images of different cars. If you look at just one photo, chances are you’ll recognize the pattern of the car in the picture quickly. Why is that? How are you able to differentiate it from any other object such as an orange or a train? Will you really find it necessary to view the other 999 images to know that the image you’ve seen is a car? Probably not, most likely because you’ve seen enough cars in your life to have a general sense of what one looks like: 4 wheels and some type of frame. But what about the computer? How can it make a similar prediction or any other kind for that matter? The computer needs some type of direction as to what to do with this data for it to distill an answer. It needs a human to feed it enough data so it too can become experienced in recognizing patterns just like us.

Supervised Learning

https://www.cc.gatech.edu/social-machines/projects.html

It’s one thing to have a huge stockpile of data saved on a computer but it’s quite another to be able to pull some type of insight or prediction from it. What’s the bridge between this seemingly large gap of raw data and a generated prediction? Algorithms. Let’s look at a very simple yet still useful algorithm that you probably have seen at some point:

Simple linear regression equation

Let’s assume that the m (widely known as the ‘slope’) and b (y-intercept) values for the above equation will not change. For every paring of x and y values, x being the value inputted into our equation and y being the corresponding outputted value, will create a simple line when graphed out.

This may be easy to conceptualize but nonetheless it can possibly provide some insight for the data we do have. Perhaps the data we have, consists of the number of hours each student has studied for a set of prior exams, the x values, and their respective score for that exam, y values. We can then calculate the m and b to give us our finalized equation that may be able to give a prediction of a student’s performance on a test based on the number of hours they study.

A line sloping upward could suggest that the more hours a student studies, the higher the score the student will likely receive. But can one variable tell the whole story behind a student’s grade? Are there other factors that may be at play?

What about the number of hours per week a student may work or the average number of hours of sleep per night a student gets? How does this change our prediction of the student’s grade and if so by how much?

Before we begin implementing any algorithm with data that the computer can use to give some type of predictive output we should be very considerate of our choice of data and algorithm. We want to ensure that the computer is learning based on data that makes sense for what we want to predict by selecting the right type of data and utilizing an algorithm that makes sense as well.

In these instances, we’re supervising the computer’s learning. Without proper guidance, the computer could easily choose numbers for m and b that may be completely off base for our simple model and thereby end up making wildly inaccurate predictions.

Algorithms

Though our above model is quite simplistic it can be extended so it can also consider other variables if we feel that’s warranted. That is, we could easily add another set of x’s and its corresponding slope value that it’s multiplied with like so:

https://i0.wp.com/brokerstir.com/wp-content/uploads/2018/04/multiple_linear2.png

But what if what we’re trying to model doesn’t quite follow a linear pattern? Are there better choices available to use for our predictions? Absolutely. For instance, we can use a logistic regression model as depicted in the graphic below, to help us assign a probability as to whether an event will happen or not such as failing or passing, being alive or dead, or winning or losing.

https://uc-r.github.io/public/images/analytics/logistic_regression/plot2-1.png

Perhaps one of the most interesting algorithms that exist are decision trees. Decision trees are made up of a series of classes and nodes that form, you guessed it, a tree-like structure. Each node consists of a type of classification rule that will determine where a piece of data will travel next based on the outcome of this “test”. Eventually the data will end its journey down the tree and reach some type of classification label. The following graphic from the New York Times is a great example of one of its predictive capabilities:

https://archive.nytimes.com/www.nytimes.com/imagepages/2008/04/16/us/20080416_OBAMA_GRAPHIC.html?scp=5&sq=Decision%2520Obama%2520clinton&st=cse

Here we see a more detailed breakdown of how the votes were split between Barack Obama and Hillary Clinton during the 2008 presidential elections. The tree is able to provide deeper statistical insights into exactly what segments of the population voted for which candidate; insights that may not have been immediately apparent otherwise.

Unsupervised Learning

Earlier I mentioned the idea of being able to guide the computer’s learning by making sure to a have a thoughtful selection of data for its inputs. Again, this can help ensure that the computer will pick the best values for the algorithm we give it and hopefully output fairly accurate predictions. After all, if it needs to run through thousands of iterations in order to get us those optimal values for our chosen algorithm we want to make sure that it’s “training” on data that makes sense, right?

But suppose we don’t have a good inkling as to what kind of data to feed the computer to use as a reasonable predictor for it to train on or know what a “good” prediction might look like. In our prior example with the student test scores, it seemed fairly straight forward the types of data we might want to use for our model. Depending on what we may want to predict however, that scenario may not be possible. Fortunately, there exists techniques to help us find some hidden patterns or structure to the data we have that might not have been immediately apparent, better known as unsupervised learning.

In unsupervised learning there are no nicely categorized data that we can input into certain algorithms that the computer can train on so it can output predictions that seem reasonable. In this situation, the computer is essentially left to “figure out” what patterns may exist in the data or what type of structure it holds. One of the most common approaches to overcome this is called clustering.

https://www.imperva.com/blog/wp-content/uploads/sites/9/2017/07/k-means-clustering-on-spherical-data-1v2.png

Clustering is a means of creating some type of classification for the data that’s being analyzed. The goal is to essentially divide out the entire population of data points one might be working with into different groups. The data in each group will share certain similarities with one another and each group can then be assigned a label. From there one could begin to draw certain inferences from the newly partitioned data.

Some common applications of clustering include being able to characterize and discover different customer segments for marketing purposes or even detecting patterns within customer data that may indicate fraudulent activity.

Machine Learning and Beyond

Machine learning has certainly come a long way since computer scientist Arthur Samuel began using the term in 1959 while at IBM. The field began to grow in the 1980’s and 1990’s when computing power overall improved and has certainly flourished in the past 10 years given the sheer amount of data that now exists which can be complementary used for the variety of algorithms that have been created.

The applications of machine learning extend further than what movies Netflix may recommend to you. One article published online in July of 2019 for the National Institutes of Health states that machine learning “is having a huge impact in cancer diagnostics”. Another by EliteDataScience reports that one study involving Google’s autonomous vehicle fleet of 55 cars, which have driven over 1.3 million miles altogether, has “surpassed human-driven cars in safety”. Matt Reaney, founder & CEO of Big Cloud was quoted by learn.g2.com as saying that the integration of quantum computing into machine learning could impact millions of lives specifically in health care as complex problems we currently face could be potentially solved in just a fraction of the time.

Where machine learning will take us next is anyone’s guess but one thing is for sure, I’ll be excited when we get there.

Sources:

--

--