A Beginner Explains Machine Learning 101

Kerry Benjamin
The Data Logs
Published in
4 min readJul 28, 2017
WOCInTech Chat

Hey all! Hope you’ve all been well. Today I wanted to talk about something I’ve been waiting to get to for quite a while: machine learning! It is really exciting and really really deep so this primer won’t tell you all you need to know to become a “machine learning jedi ninja”. Also note that since I’m new to this topic I may make mistakes. I will be open to feedback while I continue to learn. In turn, I’ll try to make this as clear as I possibly can.

Machine Learning: What is it?

Like data science, sometimes you’ll get multiple definitions for machine learning. Here’s the way I define it. Machine Learning is program that uses algorithms that learns from data in order to make predictions. You can think of an algorithm as a series of steps used to solve a problem. They find patterns within the data that helps it make predictions.

Types of algorithms

There are a bunch of algorithms used in machine learning but they all typically belong in 3 groups.

  1. Regression
  2. Classification
  3. Clustering

Regression

Regression algorithms typically deal with quantitative data. They look for patterns on different attributes of data in order to estimate or predict a numerical value. For example, if you have a list of employees at a company with attributes like how much they currently make and how many years they worked it’s possible for you to predict how much you might make based on your experience if you worked for that company.

Classification

Classification algorithms are similar to regression except for making predictions on categorical attributes. So think “gender”, “age” ,“nationality”, etc. You can use this to predict interesting things like customer churn, credit card fraud detection, or whether a company will be interested in an offer or not. Classification can even be used on images to identify things. In the medical field, a classification algorithm might be able to identify healthy and unhealthy tissue in a patient based on hundreds of images it has data on.

Clustering

Clustering algorithms group things together by similarity and separates them from other groups or clusters that are dissimilar. So with all the customers you have or people who liked your Facebook page you might want to see if they naturally form some sort of groups. Clustering is something you might use for exploratory data analysis. Its ability to group is useful because unlike regression, you might not know what you’re looking for in the beginning.

Clustering is very similar to the ironically named similarity matching algorithm. The main difference is that this algorithm tries to identify similar individuals based on data. You’ve probably interacted with this one before. Have you ever watched something on Netflix or bought something on Amazon? You ever find it interesting how they’re able make a recommendation of what you should watch or buy next? Similarity matching algorithm at work. Now you know the tech behind one of your favorite things!

Supervised or Unsupervised

There are 2 tasks that these 3 types of algorithms can fall into: supervised and unsupervised. A supervised algorithm tries to predict a new answer for a specific variable based on the data that is labelled. Regression & classification algorithms fall into this task. Think of the term labelled like working through a math textbook with the answers in the back. By looking at them you might have an idea how to solve similar types of problems.

The reason you’ll be able to predict how much you’ll make at company A is because you have labelled data on current salaries. Or the reason you can identify what kind of mushroom you see in a picture is because a classification algorithm has data on what particular types of mushrooms already look like.

When you don’t have specific data labelled for your algorithm to learn from, this falls into an unsupervised machine learning task. This is where clustering algorithms fall. Because the data isn’t “laid out” , you don’t really know what kinds of insights or patterns an unsupervised algorithm might pick up on. Again useful for exploratory data analysis.

I hope this write up was easy to digest. There is a lot more to machine learning than this but I just wanted to give you a 101. In my next post we’ll actually go through a machine learning algorithm using some housing data. Hope you’re ready!

I’m also including some links on how machine learning is being used today. The last link a great article that explains the machine learning process using food and cooking as metaphors by Renee Teate.

  1. Language Translation
  2. Medicine and Diagnosis
  3. Disaster Relief
  4. Netflix
  5. Machine Learning Terms with Food

If you liked this and learned something new hit the recommend button.

--

--

Kerry Benjamin
The Data Logs

I'm a Connector, Opportunity Seeker, Learning Data Science and Supporter of STEAM education.