Machine Learning 101

Part 1: Introduction

Bzubeda
4 min readSep 28, 2022

Ever wonder how you get the product recommendations on your Amazon website/app, how you get to know about the performance of the stocks, how weather forecasting works, or a company predicts its product’s future performance or sales? Follow me through the journey to learn about today’s one of the most trending topics “Machine Learning” through real-life examples.

No matter what background you have technical/non-technical or just a high schooler enthusiastic about new technologies, this journey will fulfill your curiosity and answer your questions as much as possible by sharing knowledge.

Let's start by understanding the basic terminologies of Machine Learning.

So, what is “Machine Learning”?

As the name implies, in simple words “machine” learns from the data. For example in weather forecasting, a machine learning algorithm is used by the machine to learn from previous historical weather data, find some unique patterns, and based on the patterns, the algorithm gives us an estimate of future weather.

Note: An “algorithm” is nothing but a step-by-step procedure used to solve a problem. For example, some of the steps for growing a tree/plant are:
1) Checking for climate conditions.
2) Put the soil, and fertilizer in a container.
3) Dig a hole to a specific depth, and add the seeds in.
4) Water it well.
5) Repeat the watering step daily depending on the soil until the tree/plant is fully grown.

When we talk about “Data” in Machine Learning, what is it?

When we specifically talk about Data in Machine Learning, they can come from several sources such as basic Excel/CSV (Comma Separated) files or a database.

It can be in various formats such as simple tabular format where data is stored in rows and columns, or Images, Videos, Texts, etc.

Types of Data

1) Labeled Data
For example, when a hospital tests whether a patient is Covid infected and maintains an excel sheet for the same, that can be considered as the Data. Various symptoms used to determine the Covid-infected patients, act as columns/fields in the excel sheet.

The column/field containing the results i.e. positive or negatively infected patients is called the Target. If the Data contains the target field, it is known as the Labeled data.

Note: The fields or columns that are used to determine the target are also called “features”.

2) Unlabeled Data
Unlike the Labeled data, if there is no target field present and we want to simply group, segment, or forecast the future based on the patterns found in the data, such data is called the Unlabeled data. For example, a mall segments its customers into groups to provide offers and discounts based on income and age.

“Phases” of Machine Learning

1) Get Data — Collect Data from various sources such as Excel, CSV, etc.

2) Data cleaning and Feature Engineering— Remove or replace null values, handle outliers (extreme high/low values that do not fall into the average range of values), replace incorrect values (eg. misspelled brand names), transform the data, compress to relevant features (feature engineering), etc.

3) Model Building — Split the data into training and testing datasets (generally 70% for training, 30% for testing). Train different models using the training data.

Note: A machine learning “model” is a representation of what was learned by the machine learning algorithm from analyzing the data.

4) Evaluate Model — Evaluate the model’s performance for making correct predictions on the test data using various metrics (eg. accuracy). Select the model that gives the best performance.

5) Model Deployment — Deploy the selected model into the real world to make real-time predictions.

Stay tuned for the next part of the journey where we will deep dive into Data patterns, Types of variables and algorithms. Meanwhile, can you think of some real-life examples for the Labeled and Unlabeled data? Go ahead and share your thoughts and views in the comments.

--

--