Introduction to Machine Learning

Published in

Machine Learning with Python

6 min readApr 30, 2021

Machine Learning is a subset of Artificial Intelligence (AI) which is focused on building applications that learn from the data and improve their accuracy over time without being programmed to do so. These algorithms are designed to learn and improve over time when they are exposed to new data. It enables the computers or the machines to make data-driven decisions.

Example 1: Based on the medical history of a patient, the doctor is able to predict whether the patient is suffering from any illness or not. In the same way, machines try to learn independently without being explicitly programmed just by gaining experience. More is the experience better is the accuracy.

Example 2: As a kid, you might have come across a picture of a tree, and your parents or teachers would have told you that this is a tree and it has some specific features such as leaves, trunks, roots, stems, etc. Now, whenever your brain comes across such features, it automatically perceives it as a tree because your brain has already learned that it is a tree. Similarly, we keep feeding images of a tree to a computer with the tag “tree” until the machine learns all the features associated with a tree.

Let’s understand this better with the images below

We’ll keep feeding images of a tree to a computer with the tag “tree” until the machine learns all the features associated with a tree.

Once the machine learns all the features associated with a tree, we’ll feed it with new data to determine how much it has learned.

Basically, training data is given to the machine so that it learns all the features associated with the training data. Once the learning is complete, the machine is provided with test data to determine how well the machine has learned.

Types of Machine Learning

Supervised Learning

In supervised learning, the machine already knows what the correct output should look like. They already know that there is a relationship between the input and the output variable.

Example: You have a dataset of houses along with their features such as location, square feet, prices, etc., and you want to figure out the price for a house that you don’t know yet. The machine will predict the price of the new house based on its learning from the dataset provided.

Types of Supervised Learning

Classification: Classification is the process of finding a model that separates input data into multiple discrete classes or labels. It predicts discrete values such as True or False, Male or Female, Spam or Not Spam, Cold or Hot, etc.

Classification algorithms:

Logistic Regression
KNN for classification
Support Vector Machine: SVC (Support Vector Classifier)
Decision Tree Classification
Random Forest Classification
Naive Bayes

Regression: Regression is the process of finding a model that predicts a continuous value based on its input variables. It predicts continuous values such as temperature, price, salary, age, etc.

Regression algorithms:

Simple Linear Regression
Multiple Linear Regression
Polynomial Regression
KNN for Regression
Support Vector Machine: SVR (Support Vector Regressor)
Decision Tree Regression
Random Forest Regression

Difference between Classification and Regression — Image source: https://images.app.goo.gl/QhYg5hHD2YAcJEeG7

Unsupervised Learning

In unsupervised learning, the machine doesn’t know what the correct output is. It allows us to approach problems with little or no idea what our results should look like. We can just derive the structure from the data by clustering the data based on relationships among the variables in the dataset.

Example: You have a dataset of customer reviews for a particular product, and you want to infer relationships and make different groups of similar reviews. These clusters can help to develop strategies to make the customers happy.

Types of Unsupervised Learning

Clustering: Clustering is the process of grouping objects into clusters. Objects with the most similarities are in one group, and the objects with less or no similarities are another group.

Clustering algorithms:

K-Means Clustering
K-Modes Clustering
Hierarchical Clustering
DBSCAN

Association: Association is a process that is used to find relationships between variables in a large database. It is used to discover the set of items that occurs together in the dataset. Such as, people who buy A items also tend to purchase B items.

Machine Learning Steps

1. Data Acquisition: Data Acquisition is the process of populating the dataset with correct and important features. When acquiring the data, we need to have enough features populated to train the learning model correctly.

2. Data Cleaning and Pre-Processing: Data Cleaning is a critical process for the success of any machine learning model. It is a process used to determine inaccurate, incomplete, or unreasonable data and then improve the quality by correcting detected errors, reducing errors, and improving data quality. Data pre-processing is a technique that is used to transform the raw data into a useful and efficient format. The datasets may contain characters, strings, and non-numeric values. The Machine Learning algorithm cannot directly use these values. Hence, these values need to convert into numerical values.

3. Exploratory Data Analysis: Exploratory Data Analysis is a process of analyzing and visualizing datasets to summarize their main characteristics, often with visual or graphical methods. The goal is to obtain confidence in data to a point where it is ready to engage a machine learning algorithm.

4. Training the Model: Training a Machine Learning model involves providing a learning algorithm with training data to learn from. The training data must contain the correct answer, which is known as the target variable. The learning algorithm finds patterns in the training data that map the input data attributes to the target, i.e., the answer you want to predict. Then it outputs a Machine Learning model that captures these patterns.

5. Testing the Model: Testing a Machine Learning algorithm involves employing new data points to the trained algorithm and predicting the output results. A test dataset is independent of the training dataset and follows the same probability distribution as the training dataset. If a model fits the training dataset, it also fits the test dataset.

6. Evaluating the Model: Evaluating a Machine Learning model is the process of estimating the accuracy and performance of a model on unseen data. You need to evaluate the performance of the model to know how good your model is.

Next blog: Simple Linear Regression Implementation in Python

Hey guys! I’m Harshita. I’m a Data Science student and trying to contribute a bit to the community by sharing my knowledge. Please share this with someone you know who is trying to learn Machine Learning. I would appreciate your comments, suggestions, or feedback. Thank you.
Email Id: harshita.1128@gmail.com
LinkedIn: www.linkedin.com/in/harshita-11