Where to start Machine Learning?

Bashir Alam
6 min readAug 2, 2022

--

Introduction to machine learning

Are you someone who is starting with Machine learning and are not sure where to start and what are the key concepts in machine learning that you need to understand as a beginner? You don’t need to worry. This is my first article on medium about introduction to Machine learning using Python. I wanted to share some of the key concepts that a beginner should know. In this article, we will discuss the skills that you need to have in order to start with Machine Learning and the basic concepts of machine learning.

Skills you need to have to start with Machine learning

Let us first discuss the soft skills that you need to have in order to start with Machine Learning. As a beginner, you should have good command over pandas and NumPy modules The reason for having a strong understanding of these modules is that you can use pandas and NumPy modules to explore the dataset, preprocess and even visualize the dataset using various useful plots.

Keep in mind that Machine learning is all about dealing with different datasets. Machine (computer) can only understand the data( numeric values) so you have to be able to explore and preprocess data using various Python tools.

The next important skill that you need to have is the ability to visualize the data and results using different plots and interpret the different graphs. You can use various Python modules for visualization purposes. For example, plotly, matplotlib, and seaborn.

Getting started with Machine Learning

Machine learning is simply asking the computer to go through the dataset and find the hidden patterns that are difficult for a normal human. The computer goes through the data set, again and again, finds the trend and generalizes these trends or findings on other input data, and gives predictions. The main purpose of Machine learning is to make predictions. These predictions are made based on the dataset that has been provided to the machine to find the trends. It would be very unfair and irrelevant if we use the weather dataset to train the machine and ask the predictions about the covid testing. The point is the machine can make only predictions relevant to the dataset that has been fed to it.

So in simple words, we give preprocessed datasets to the machine, the machine uses different algorithms ( specified by us) to find the hidden patterns and then generalizes the results on other input data. So, depending on the type of dataset and training process, there are mainly three different kinds of Machine learning algorithms and in this section, we will discuss two of them because as a beginner you should have basic understanding of these two types of machine learning algorithms.

Supervised Machine Learning Algorithms

Supervised Machine learning is simply training the machine on a dataset that has input and output values. For example, when we use a dataset that contains the images of dogs and cats and also provide the label to each image as dog or cat so that the machine understands which images belong to the cat category and which images belong to the dog category. Supervised learning is more like explicitly training the machine on a dataset by providing both the inputs and the corresponding outputs. The following is an example of a dataset that can be used for supervised machine learning.

Supervised learning

As you can see, the dataset contains total marks and the obtained marks as input values and then corresponding classes ( fail or pass) as output categories. This data set is an example of a dataset that can be used for supervised machine learning tasks.

Some of the basic supervised machine learning algorithms that you need to know as a beginner are the KNN algorithm, decision trees, random forests, naive Bayes, logistic regression, and vector machine learning. These are very easy to learn and you can easily implement them in Python using sklearn module.

If you are looking for some advanced level supervised machine learning algorithms then you should start learning boosting algorithms, like the Ada boost, gradient boosting algorithm, XGBoost algorithm, Catboost, and LightGBM boosting.

Unsupervised Machine Learning algorithms

Unsupervised machine learning is simply providing a dataset to the machine that does not have labels or the specified output. The machine will find the difference and similarities in the dataset and comes up with different clusters or predictions. For example, providing the images of cats and dogs without telling the machine which one is a cat or dog is a type of unsupervised machine learning.

The following dataset is an example of an unsupervised machine learning dataset.

Unsupervised learning

As you can see, the dataset contains information about the total marks and the obtained marks but it doesn’t have any labels. So, the machine will find different clusters and relationships between the input values.

Some of the important unsupervised learning algorithms that you should know are clustering algorithms and dimensional reduction algorithms( like PCA).

Understanding the working of Machine Learning Step by Step

The general working of Machine learning is so simple to understand. In this section, we will go through the general machine learning process step by step.

Data preprocessing and preparation

The very first step toward training a model in Machine Learning is exploring and preprocessing the dataset. It is important to know the dataset and make it suitable for the training. Here is where you use your visualization, analytical and preprocessing skills. The more you understand the dataset, the more it will be easy to implement the ML model because then you will have a clear idea of which algorithm to use on the dataset.

Mostly, this is the part that takes most of the time during the Machine learning process. It is said that as a Machine Learning developer you should spend 80% of the time on data preprocessing and only 20% on training and evaluating the model. But as a beginner, mostly we get preprocessed datasets from the internet.

After exploring the dataset, you should divide the dataset into the testing part and training part, so that you will use the training part to train the model and then use the testing dataset to evaluate the performance of the model.

Training the Machine Learning model

Training the machine learning model is easy. By exploring the dataset, you should have an idea about using the type of algorithm on the dataset. Once you decided to use which model. Then import the model from the required module, initialize the model and then train the model using the training dataset.

Testing and evaluating the model

Once the training is complete, you can then use the testing data to make predictions.

To know how well the predictions are and how close they are to the actual value, we need to evaluate the model. In machine learning, there are various matrices to evaluate the performance of the model. For example, for the classification problem, we can use accuracy score and confusion matrix while for the regression dataset, we can use mean score, mean absolute score, and r-square score.

Visualizing the results

You can again use your visualization skills to show the results of the Machine learning model and understand the outcomes visually.

What if the model fails to make accurate results?

This is often the case with the Machine learning models that sometimes fail to make accurate results and perform poorly. In such cases, we can use hyperparameter tuning. Hyperparameter tuning is the process of finding optimum parameter values for the model. There are various different hyperparameter tuning methods available. You can use any of them depending. The most commonly used one is GridSearchCV and it is easy to understand as well.

NOTE: You can get access to the implementation of the source code of various ML and ANN algorithms from my GitHub link. Please don’t forget to follow and give me a star.

Summary

Machine learning is a process of training the machine/computer based on the training dataset and then generalizing the results on the incoming data. In this article, we generally discuss the introduction of Machine Learning for beginners.

--

--