Machine Learning Algorithms for absolute beginners

11 min readMar 27, 2023

Artificial Intelligence is every where. If you aren’t familiar with its concepts, you think of something out of I-Robot, or Ultron from the Marvel Cinematic Universe (Disclaimer: I love his dry humour). However, in this article, I am going to demystify Machine Learning, a branch of AI, in a way that is easy to digest and doesn’t leave you scratching your head. Machine Learning has so many applications in real life, especially in business, such as ad recommendation, self driving cars, object detection, natural language processes, etc. and a lot of business are or have positioned themselves to take advantage of this.

To start off, we are going to answer a few question, starting with, what is Artificial Intelligence(AI)? AI is basically the study/process which enables machines to mimic intelligent human behaviour through a particular algorithm. On an atomic level, AI is simply the ability for machines to imitate intelligent human behaviour. There are six core branches of AI. They are neural networks, deep learning, computer vision, natural language processing, cognitive computing and Machine Learning which we will be talking about today.

The second question is, ‘What is an algorithm?’ An algorithm is simply a series of steps one takes to perform a task. Think of the steps you take when preparing your favourite soup. That is an algorithm. The steps you take to get ready for work, tie your shoe, solve a mathematical problem, all these are algorithms.

Note: In this article the term machine is used to refer to a computer, computer program or an algorithm and not an actual physical robot. We will also look at popular algorithms for supervised and unsupervised learning.

Machine Learning in a nut shell

Now, say we wanted to create an app that is able to accurately predict if an image is a cat or dog. In traditional programming, we would have to programmatically come up with so many rules that the application must look out for like specific curves of the image, color pattern, tail length, height, body mass etc. to determine if a picture we show it, is a cat or a dog. This program would be incredibly complex to code out if not impossible. If we fed this program with an image of a dog from an angle we did not predict when we were writing this program, our program may crush and not work. Even if we managed to write this incredibly complex program and got it to work somehow, if in future we wanted to extend this program to identify say kangaroos, we would have to rewrite all these rules. This is not ideal for any programmer. This is where machine learning comes to the rescue.

Machine learning is based on the concept that systems/machines can learn from data, recognize patterns, and make decisions with little or no human interference. That is, instead of writing out the rules our program should look out for, we feed machines information (pictures of cats and dogs in this case) and allow them to learn and come to their own conclusions. Of course, these conclusion are based on algorithms AKA math. In this article, I will not be breaking down the math behind the algorithms but the general underpinnings.

The Types of Machine Learning.

Machines can learn in 4 types of ways. The first 3 types of learning are data driven, i.e. they learn from historic data. The 4 types of learning are

Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning

SUPERVISED LEARNING

Supervised learning is a type of machine learning where machines are trained with labeled data to classify data or predict outcomes accurately. Labeled data here means that for each example or data we give to the algorithm, we also add the correct answer. So it learns the correct answers and then it is able to predict a value, or identify what something is, say an image that was not in the original data set we used to train it.

Supervised learning can perform 2 tasks, and thus have 2 subcategories.

Classification
Regressions

Classification.

This is where the output variable is categorical with 2 or more categories. That is, is this email spam or not spam, is this a dog or a cat, or a kangaroo? Examples of classification algorithms are K-Nearest Neighbour, Decision Trees, Random forest, SVMs, Naïve Bayes, Logistic regression etc. Note, some of these algorithms can be used for regression tasks as well.

Regression

This is when the output variable is a real or continuous value(i.e. a number). This value is dependent on other variable(s). Eg. how much tax I will pay will be dependent on how much I made that year. In this case the dependent variable is how much tax I will pay, and the independent variable is my income. A popular regression algorithm is Linear Regression. Decision trees, Random Forest and SVMs also perform regression tasks.

Supervised Learning Algorithms

Now lets look at some popular supervised learning algorithms and how they do what they do.

K-nearest neighbours: The K-nearest neighbours algorithm, also known as KNN uses proximity to make classifications or predictions about the grouping of an individual data point. While it can be used for either regression or classification problems, it is typically used as a classification algorithm, working off the assumption that similar points can be found near one another.

From the diagram below, the data set is made up of green squares and orange triangles. However, a new object, a blue circle has been introduced into the data set. K stands for the the number of nearest neighbours to include in the voting process. In our example where K= 5, a K-nearest neighbour algorithm, will look at the 5 closest data points closest to the unidentified object i.e. the blue circle. The 5 data points will then take a vote, and the group with the most votes will add the new blue circle to their group. So, a KNN in the example below will predict the blue circle is a green square.

Decision Tree: Decision trees build classification or regression models in the form of a tree structure. The first node also called the root node is decided by a mathematical formular known as the Gini impurity which calculates the purity of a split in a decision tree based on the diversity or different data points in the data set. Each node can only have 2 nodes. The node moving to the right represents the decision that was made based on the trained data. A node with arrows pointing to and away from it is a called a branch node. A node with no arrows pointing away from it is a leaf node. The final result is a tree with decision nodes and leaf nodes.

Random Forest: A random forest is basically an ensemble or group of decision trees. It is called a random forest because it takes a random sample of the data and builds a series of decision trees. It helps remove bias and overfitting (model starts to memorize data instead of learning) from your model to improve the performance and prediction accuracy. You can set parameters for the number of trees, you want to use, number of features/columns, and even the node size. This makes random forest very popular because it is extremely useful in solving a lot of classification problems by giving you the flexibility based on your specified features. After the forest has been built, the algorithm averages out the accuracy of all the other tree in it using a ‘vote’.

Naïve Bayes: Naïve Bayes uses the Bayes’ Theorem to find the probability of an event occurring given the probability of another event that has already occurred. Now I know I said I wasn’t going to talk about the math, but Bayes’ Theorem operates on the assumption that the probability of A happening given B, is equal to the probability of B given A multiplied by the probability of A divided by the probability of B. This is conditional probability. It is used to classify things like spam or no spam. That is the probability of an email being spam, given that a previous email with a particular style/word was spam.

Logistic Regression: Despite its name, logistic regression is a linear classifier. If you read the Scikit-Learn documentation, it is in fact the first sentence. It is often used for classification and predictive analytics. Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote, based on a given dataset of independent variables. Since the outcome is a probability, the dependent variable is bounded between 0 and 1. Values/data points below 0.5 will belong in say the didn’t vote category and those above 0.5 will belong th the voted category.

Linear Regression: A linear regression model describes the relationship between a dependent variable, y, and one or more independent variables, X. The dependent variable is called the response variable, while independent variables are called predictor variables. Example, how much tax I will pay will be dependent on how much I made that year. Linear regression models are easy to understand because they use the formula for a straight line which we are all familiar with.

Formular for a straight line

y = mx + c

where 
y = dependent variable (how much tax I will pay)
m = the slope of the line(calculated from the data set)
x = the independent variable (how much I made this year)
c = y intercept(the point where the line crosses the y axis)

UNSUPERVISED LEARNING

Unsupervised learning is a type of machine learning that uses unlabeled datasets to train algorithms to discover patterns that help solve clustering or association problems. This means that for each data point we do not include the correct answer , and so this type of machine learning cannot predict values or classify things into certain groups because it is not aware of any labels as we didn’t make that available during training.

Unsupervised learning has 3 subcategories or tasks it can perform

Clustering
Association
Dimensionality Reduction

Clustering — This is where the algorithm groups similar things together based on their shared features without actually knowing what those groups are. Examples include customer segmentation, semantic clustering. Examples of clustering algorithms are K-means clustering and Hierarchical clustering

Association — This is based on the probability of co-occurrence in a data set where the algorithms look for relations between the data. This is normally used for things like Market Basket Analysis i.e., which items are bought together. A great example of an association algorithm is the Apriori Algorithm

Dimensionality Reduction — This is a subcategory of unsupervised learning where the algorithm reduces the number of variables in the data while still preserving as much of the data. It is often used in pre-processing to remove noise from images and audio to produce better quality.

Unsupervised Learning Algorithm

K-Means clustering: is an unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process. If K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

In the example below where we are trying to group our data into children and adults. Our data is not aware that it is grouping adults and children, because we do not provide that label. All we provide to our algorithm is their height, and weight, and the algorithm will group them based on their similar characteristics. The data analyst will them label the groups as adult, and children.

Apriori Algorithm: Apriori algorithm refers to the algorithm which is used to calculate the association rules between objects. It means how two or more objects are related to one another. In other words, we can say that the apriori algorithm is an association rule learning that analyzes that people who bought product A also bought product B. Used by a lot of ecommerce websites such as Amazon, and Alibaba.

Principal component analysis(PCA): Is used to reduce dimensions. What this means is that we reduce the number of dimensions or features that a data point has, to make machine learning training easy , and more memory efficient. PCA projects data to a lower-dimensional space, but preserves as much of the data as possible.

In the image below, you realize that the image on the left has a higher dimension (3 height, width & depth) as opposed to the images on the right which has only 2 dimensions (height & width). Check out this awesome video on dimensionality. This algorithm is used to remove noise from images during pre-processing of your data.

Semi-Supervised Learning

Semi-supervised learning is a happy medium between supervised and unsupervised learning. It combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning is extremely useful in situations where unlabeled data is abundant and obtaining labeled data is expensive.

So in our earlier example of training a machine learning model to determine whether an image was a cat or dog, if we had a million images of cats and dogs, it would be too time consuming and expensive to hire people to label all this data. We can simply label a few thousand of them, and use a semi-supervised learning to train our model for our app. Examples of algorithms used for semi-supervised learning includes Support Vector Machines using a Self Training Classifier.

Reinforcement Learning

Used a lot in game development, reinforcement learning is the science of decision-making. It is about learning the optimal behavior in an environment to obtain a maximum reward. Data is not part of the input as we would find in supervised or unsupervised machine learning. That is, the data is accumulated from the machine learning model through trial-and-error. This is how DeepMind’s AlphaGo was trained to beat the world Go champion.

In reinforcement learning, and agent(our machine learning model) initially take an action on an environment (the game we are teaching it, Go for example). After each action, the algorithm/agent receives feedback that helps it determine whether the choice it made was correct, neutral or incorrect. If the action they took was the right one, then we will reward the algorithm and it will know to repeat that action, but if the algorithm is wrong, we will punish it.

Don’t worry, no one is whipping tiny little robot that will grow to hate us and take over the world. Reinforcement learning is based on something known as optimal policy which means the agent will only take actions that maximize their reward. Reward being the positive feedback we give to the agent to shape their future actions. Examples of reinforcement learning algorithms are Q-learning, Deep Q Network (DQN), State-Action-Reward-State-Action (SARSA) etc.

Conclusion

Machine Learning is a branch of AI that allows us to train algorithms to perform certain tasks without having to do so programmatically with code. Leave a comment, and follow. Also check out the Scikit Learn Documentation here.