Basics of Supervised Learning: Techniques you’ll need in order to be successful

Kenneth Zhang
5 min readDec 31, 2019

--

Scary data…ahhhh!

At the highest levels of data science and machine learning, data scientists require different algorithms to understand patterns and sequences in a select piece of data. This can be quite tricky to do without the help of a technique.

Supervised Learning is the most common way, and usually, the first method taught to machine learning engineers and data scientists. This is usually because existing input values can be inputted without having to create or generate a dataset. In addition, it is much easier for Machine Learning engineers to focus on developing an algorithm for mapping a function from the input to the output.

Basic Techniques of Supervised Learning

The basic techniques of Supervised Learning include linear regression, logistic regression, multi-class classification, decision trees, and support vector machines. All these techniques require a dataset that is already titled with the correct answers. Problems that can be solved using the five techniques given can be further simplified into classification or regression scenarios.

Classification

The classification method sorts output values or variables into specific categories given by the dataset. Although it is not always the most precise, it predicts the end result of the given values. The outcomes do not need to be split into two specific categories, but rather multiple at the same time. Techniques that can be classified into the classification type problems are random forest, multi-layer perceptron, decision trees, and logistic regression.

The most common and simple form of a classification problem can be the sorting and categorizing of spam emails. There are two distinct categories; spam or normal. The computer will analyze a given input and predict based on those inputs what a spam email may be in the form of.

Steps to Building a Classification Solution

Step 1.

Import the necessary variables for the dataset you are going to analyze and predict. Make sure you have all the proper libraries and ensure that they exist in the python version you are utilizing.

Step 2.

Import the dataset using either a .dat or .csv format. This can be done either by creating your own dataset or by using a trusted dataset from a website like GitHub.com. In this case, the dataset was imported from GitHub that had the petal and sepal length and width already sorted into categories. You may also want to print it out to make sure it is in the correct format.

Step 3.

Separate the predicting column from the entire dataset. Initialize a variable so that it is able to separate and differentiate the columns and separate them from the rest of the data.

Step 4.

Split the data into test and training datasets.

Step 5.

(optional, but not really)

Use a random forest classifier for the prediction. Classifiers that are part of the library imported at the beginning can be utilized for the prediction of output values. Then display the values.

Regression

A regression scenario is where an output variable is a continuous function (value) that can be graphed and followed. Things such as income, mileage, and many other variables can be continuous. The simplest form of regression is linear regression where the computer tries to fit the data with a hyper-plane. (a hyper-plane is a subspace or sub-dimensional space that has one less dimension than that of its contexture space)

Predicting problems that utilize regression can be like predicting one’s income in 10 years, or the depreciation of a Mercedes-Benz in 1 year.

Logistic regression is a different model used in regression and is more statistical. Its parent function, being the logistic function, is utilized to model a binary dependent variable (an output that can be coded with 0 or 1). Logistic regression is always used when predicting and estimating between a set pair of parameters of the given logistic model.

Building a Logistic Regression Solution

Step 1.

Use the matplotlib python library to illustrate the dataset.

Step 2.

You can either import and load your dataset or create your own. For a full understanding, I have plotted my data and displayed it in a future portion. When loading the data make sure you also split the data into training and testing sets so that the machine learns more efficiently.

Step 3.

Plot your outputs. Try to set different colours two the two different sets. Like I made the testing data red and the training data blue.

Step 4.

Create a linear regression object to train the training objects and then display the outputs.

Conclusion

Supervised learning is the most common type of learning because of its simplicity and accessibility. Machine Learning engineers can focus more on developing algorithms for patterns and sequences in a dataset rather than spending the majority of the time making the dataset. The techniques in supervised learning allow the programmer or engineer to categorize test and training values and predict values based off of those. There are five main techniques that are in Supervised learning but can be further simplified down into classification and regression problems.

Now that you have an understanding of Supervised Learning, go try some of these programs out and see what you come up with.

Happy learning!

--

--