Demystifying the Buzzwords: AI, Machine Learning, and Data Science Explained

Anju Reddy K
10 min readJun 14, 2024

--

In this particular blog I am not just going to explain the basics or just the important stuff in machine learning, I will try to cover everything a person should know about machine learning and its algorithms along with the math intuition. Firstly, lets understand what’s the difference between AI, ML and DS is and then let's dive into the world of machine learning.

Topics Covered:

  1. Difference between AI, ML and DS
  2. Machine Learning
  3. Difference between Supervised and Unsupervised ML and it’s types.
  4. Linear Regression
  5. Performance Metrics

1. Difference between AI, ML and DS

Definition

Artificial Intelligence (AI): AI is like a smart robot that can act and learn just like a human does. It can even play games; it can recognize voice, or it can even drive cars. It is just making the machine act smart.

Machine Learning (ML): ML is a special part of AI which learns from data just like how we learn from our homework. Imagine training a machine to recognize cats by showing lots of cat photos. The more it sees, the more it learns.

Data Science (DS): DS is like a detective in the data. Data Scientists collect lots of data and use computers to find hidden patterns and secrets in it. This helps companies make better decision, like what video game to create next or how to build safer cars.

Differences: Scope

  • AI: The big umbrella that include anything that makes machine smart.
  • ML: A part of AI focused on making machines learn from data.
  • DS: Uses data to find patterns and solve problems, often used ML tools.

Purpose:

  • AI: To create smart systems that can mimic human intelligence.
  • ML: To enable machines to improve their performance based on their experience.
  • DS: To analyse data and extract meaningful information to draw conclusions.

Example:

  • AI: Siri on your phone, which understands and responds to your questions.
  • ML: Netflix recommending shows based on what you have watched.
  • DS: Scientist analysing the weather based on temperature data.

2. Machine Learning:

Machine learning is a way of teaching a computer to learn from data and make decisions and make predictions without being explicitly programmed for specific task.

Imagine you have a picture at your hand, and you want the computer to recognize whether the picture is cats or dogs in each one. Here’s how it learns to do that using machine learning.

i) Show Examples: You provide the computer with large set of data, with some images labeled as ‘cats’ and other part labeled as ‘dogs’.

ii) Finding Patterns: The computer analyzes these pictures and identify the patterns. It might find out cats have points ears and whiskers, but dogs have different kind of features.

iii) Learning Process: By analyzing the patterns the computer builds the model. This model will help with what features are associated with cats and what features are associated with dogs.

iv) Testing: You then show a new picture to the computer it has never seen before in the training example. It uses the model that it has built and recognizes the picture whether it is a cat or dog.

v) Improvement: When the computer makes a mistake it learns from them. You correct its mistake, and it adjusts its model to improve the accuracy.

3. Difference between Supervised and Unsupervised Machine Learning:

Supervised learning is like a school class where teacher helps you to get the clear instruction and the answers to practice.

Here’s how it works:

  • Training data: You have set of data with labels. For example, you have pictures of animals with the labels which says “cat” or “dog”.
  • Learning: The computer looks at the pictures and their labels to learn the difference between cats and dogs.
  • Predicting: After learning it can now look at the new picture and predict whether the picture contains cat or dog.

Types of Supervised Learning:

  • Classification: Sorting things into category. For example, email being labeled as “spam” or “not spam”.
  • Regression: Predicting numbers. For example, predicting the price of the house based on size and its location.

Unsupervised learning is like exploring things on your own. You find patterns and group things together by yourself.

Here’s how it works:

  • Training data: You have set of data without any labels. You have picture of animals without any labels.
  • Learning: Computer looks at the picture and group the similar ones together based on the patterns it finds.
  • Discover: It can discover new group or categories within the data, like grouping all the cats together and grouping all the dogs together.

Types of Unsupervised Learning:

  • Clustering: Grouping similar things together. Organizing mixed pile of toys into separate piles cars, blocks and dolls.
  • Association: Finding rules in large portion of data. For example, in a grocery store if people often buy bread and butter together, the computer will notice this pattern.
  • Dimensionality Reduction: It is like reducing the number of features while keeping the important aspects as it is. Let's say you have a coloring book with hundreds of different colors, now dimensionality reduction would help you to reduce the coloring by just using the main colors and that still makes the picture recognizable.

4. Linear Regression: It is a statistical technique which is used to model the relationship between one dependent variable (target) and one or more independent variables (features). It is way to predict the value of a target variable based on the value of features.

Basic concept
Imagine you have a data point on a graph, and you want to draw a line on the graph which best suits all the points. And this line then can be used to predict new values.

Simple Linear Regression
For simple linear regression, which involves one independent variable (x) and one dependent variable (y), the relationship is modeled using a straight line.

y = mx + c

Where,

  • y: is the dependent variable, (what you want to predict).
  • x: is the independent variable, (the feature you use for prediction).
  • m: is the slope of line, (how much y changes for 1 unit change in x).
  • c: is the y intercept, (the value of y when x = 0).

Mathematical form,
The linear regression formula can be generalized as,

Where,
beta zero, is the intercept.
beta one, is the slope.

In the context of machine learning, we often use the notation.

Where,

  • hθ​(x): (hypothesis) is the predicted value of y for given x.
  • θ0​: is the intercept.
  • θ1: is the slope.

Mathematical Intuition:

The goal of the linear regression is to find the best fitting line that minimizes the difference between actual data points and predicted values on the line. The difference is measured using the residual sum of squares (RSS).

Where,

  • yi: is the actual value
  • y^​i: is the predicted value
  • n: is the number of data points.

The predicted value y^i can be written as

Finding the best parameters.

To find the best θ0 and θ1, we use the method least squares, which minimizes the RSS. The formula to compute the optimal RSS are.

Where,
* x`: is mean of the x values
* y`: is mean of the y values

Example,
Suppose we have the following data

  1. Calculate for x and y

2. Compute θ1:

3. Compute θ0:

So, the best fitting line is.
y = 3.0+0.2x

Conclusion:

Linear regression is simple yet powerful tool for predicting the values based on the linear relationship between variables. By understanding the formulas and math behind them, you can see how the best fitting line is determined to make accurate predictions.

What we need to solve to identify the best fitting line?

To find the best fit line in the linear regression is we need to find the best parameters for θ0 (intercept) and θ1 (slope) that minimize the error between predicted value and the actual data point. The main function involves setting up the cost functions and then optimizing it.

Cost function:

The cost function, often called as Mean Squared Error (MSE) in linear regression, measure the average squared difference between actual data point (yi) and predicted values (y`i).

The cost function J (θ0,θ1) is defined as;

Where,

  • m: is the number of data points
  • hθ​(xi​)=θ0​+θ1​xi​: is the predicted value for the i-th data point.
  • yi:​ is the actual value for the i-th data point.

Optimization: Minimizing the cost function

To find the best fit line we need to minimize the cost function J(θ0​,θ1​). This can be done using various optimization techniques, with gradient descent being the most common optimizing technique.

Gradient Descent: It is the iterative optimization algorithm used to find the minimum of a function. In the context of linear regression, it helps us find the optimal values for θ0 and θ1 that minimizes the cost function.

i) Initialize parameters: Start with initial guesses for θ0 and θ1 (usually set to 0).

ii) Computer Gradient: Calculate partial derivatives of a cost function with respect to θ0 and θ0.

iii) Update parameters: Adjust the parameters in the opposite direction of the gradient to reduce the cost function.

Where α is the learning rate, which decides the size of steps which takes to the minimum.

iv) Repeat: Continue computing the gradient and updating the parameters until the cost function converges to minimum (i.e. changes very little with each iteration).

Analytical Solution: Normal Equation

For simple linear regression, there is also analytical (closed form) solution, known as normal equation. This method directly computes the optimal values of θ0 and θ1 without iterative updates.

i) Matrix Form: Express linear regression model in matrix form.

Where,

  • y: is the vector of actual values
  • X: is the vector of input features (with a column of ones for intercept)
  • θ: is the vector of parameters [θ0 and θ1].

ii) Optimal parameters: The optimal values of θ can be computed using.

5. Performance Metrics

Performance metric helps us with evaluation how well our linear regression model or any machine learning model is performing. They provide quantitative measures to assess the accuracy and quality of predictions made by the model. Two common performance metrics for linear regression are R-Squared (R²) and Adjusted R-Squared.

R-Squared (R²)

R-squared also known as the coefficient of determination, measures the proportion of variance in the dependent variable that is predictable by independent variable(s). It indicates how well the regression line approximates the real data points.

Where,

Sum of the squares of residuals (errors)
Total Sum of Squares
  • yi: is the actual value.
  • y^​i: is the predicted value.
  • y`: is the mean of actual value.
  • n: is the number of data points.

Interpretation:

R² value ranges from 0 to 1.

  • R² = 1; the model perfectly predicts the dependent variable
  • R² = 0; The model does not explain any kind of variance in the dependent variable.
  • Higher R² indicates better model performance.

Adjusted R-Squared (Adjusted R²)

Adjusted R-squared adjusts the R² value based on the number of predictors in the model. It accounts for complexity of the model by penalizing the addition of unnecessary predictors. This metric provides more accurate measure of model performance, especially when comparing models with different number of predictors.

Where,

  • n: is the number of data points
  • k: is the number of predictors (independent variables) in the model.

Interpretation:

  • Adjusted R² can be lower than R².
  • It adjusts for number of predictors, so adding irrelevant predictors can decrease the adjusted R².
  • A higher adjusted R² value indicates a better model, considering the number of predictors.

Mathematical Intuition:

  • The term (1-R²) represents the unexplained variance.
  • The adjustment factor (n-1) / n-k-1 penalizes the addition of predictors accounting for degrees of freedom.
  • Adjusted R² provides more balanced measure of model performance by considering both the explained variance and the model complexity.

Example: Suppose we have a dataset with 10 data points and simple linear regression model with just one predictor, we calculate R² and Adjusted R² as follows.

i) Computer R²:

Let’s say R² = 0.8

ii) Compute Adjusted R²

  • n = 10 (number of data points)
  • k = 1 (number of predictors)

Conclusion:

  • measures the proportion of variance in the dependent variable explained by the model.
  • Adjusted R² provides more accurate measure by adjusting for the number of predictors, penalizing the addition of unnecessary predictors.
  • Both metrics help assess the performance of linear regression model or machine learning model, with Adjusted R² being more reliable when comparing models with different number of predictors.

Why reading is better than watching YouTube videos?

  • Reading let’s your imagination run wild. You can create pictures in your mind, and that’s like having personal movie in your mind.
  • When you are reading you will be focused on words, but while watching videos you will be distracted by flashy visuals and adds.
  • Reading encourages you to think. You pause, reflect and understand things deeply. It’s like having conversation with the author itself.
  • Reading is patient game. It’s not a race. You learn to enjoy the journey, and that helps for you in many areas of the life.

--

--

Anju Reddy K

Hello data enthusiasts 👋, I am currently working as image processing developer, I speak about NLP and Deep Learning Neural Networks.