Mastering the Fundamentals of Machine Learning algorithms 📝 📚 — Part 1

6 min readJun 30, 2023

Hello readers! Welcome to our comprehensive guide on fundamental machine learning algorithms. In this blog, we will delve deep into the concepts, mathematics, assumptions, and practical implementations of some of the most widely used algorithms in the field. Whether you’re a beginner looking to build a solid foundation or an experienced practitioner seeking a refresher, this guide will provide you with a clear understanding of linear regression, logistic regression, decision trees, random forests, support vector machines (SVM), and principal component analysis (PCA).

We’ll explore the underlying mathematics and discuss the assumptions made by each algorithm. Additionally, I have included interview-based questions with answers to help you prepare for machine learning-related interviews. So, let’s dive in and unravel the powerful machine learning techniques!

Linear Regression:

Linear regression is a supervised learning algorithm used to predict a continuous target variable based on one or more input features. It assumes a linear relationship between the features and the target variable.

Math Formulas

Hypothesis function: hθ(x) = θ₀ + θ₁x₁ + θ₂x₂ + … + θₙxₙ

Cost function: J(θ) = (1/2N) * Σ(hθ(xᵢ) — yᵢ)²

Gradient Descent update rule: θⱼ := θⱼ — α * (1/m) * Σ(hθ(xᵢ) — yᵢ) * xⱼᵢ

Assumptions

Linearity: The relationship between the features and the target variable is linear.
Independence: The input features are independent of each other.
Homoscedasticity: The variance of the residuals is constant across all levels of the target variable.

Interview-based Q&A

Q1. What is the objective of linear regression?

Ans: The objective of linear regression is to find the best-fitting line (or hyperplane in higher dimensions) that minimizes the sum of squared differences between predicted and actual target values.

Q2. What happens if the assumptions of linearity and homoscedasticity are violated in linear regression?

Ans: If the assumptions are violated, the model’s predictions may not be accurate, and the estimates of coefficients may not be reliable.

Q3. How to handle multicollinearity in linear regression?

Ans: Multicollinearity occurs when two or more input features are highly correlated. One approach to handle it is to perform feature selection or dimensionality reduction using techniques like PCA.

Q4. What is the role of the learning rate (α) in gradient descent for linear regression?

Ans: The learning rate determines the step size in each iteration of gradient descent. A larger learning rate may lead to faster convergence but could overshoot the optimal solution, while a smaller rate may result in slow convergence.

Q5. How do you evaluate the performance of a linear regression model?

Ans: The performance of a linear regression model can be evaluated using metrics such as Mean Squared Error (MSE), R-squared (R²), and Mean Absolute Error (MAE).

Logistic Regression:

Logistic regression is a classification algorithm used for binary or multi-class classification problems. It models the probability of a binary outcome using a logistic function.

In logistic regression, the goal is to find the optimal values of the coefficients that maximize the likelihood of observing the given data. The coefficients, when multiplied by the input variables and summed, produce the log-odds of the positive class. By exponentiating the log-odds using the logistic or sigmoid function, the probabilities of the positive class can be obtained.

Math Formulas

Hypothesis function: hθ(x) = 1 / (1 + e^(-θᵀx))

Cost function (log loss or binary cross-entropy):

J(θ) = (-1/N) * Σ[ yᵢ log(hθ(xᵢ)) + (1 — yᵢ) log(1 — hθ(xᵢ) ]

Assumptions

Linearity: The relationship between the features and the log-odds of the binary outcome is linear.
Independence: The input features are independent of each other.
No multicollinearity: There should be no multicollinearity among the input features.

Interview-based Q&A

Q1. What is the difference between linear regression and logistic regression?

Ans: Linear regression is used for predicting continuous numeric values, while logistic regression is used for binary or multi-class classification problems.

Q2. How does logistic regression handle multi-class classification problems?

Ans: Logistic regression can handle multi-class problems by using techniques like Softmax regression.

Q3. What is the logistic function used for in logistic regression?

Ans: The logistic function maps the linear combination of input features and model parameters to a probability value between 0 and 1, representing the likelihood of the binary outcome.

Q4. How do you interpret the coefficients in logistic regression?

Ans: The coefficients in logistic regression represent the change in the log-odds of the binary outcome associated with a unit change in the corresponding input feature, assuming all other features are constant.

Q5. What is the purpose of regularization in logistic regression?

Ans: Regularization in logistic regression helps prevent overfitting by adding a penalty term to the cost function, encouraging smaller coefficients and reducing model complexity.

Draw backs

Linear Decision Boundary: Logistic regression assumes a linear relationship between the input variables and the log-odds of the positive class. This means it can only model linear decision boundaries. If the relationship is highly nonlinear, logistic regression may not capture the complex patterns in the data accurately.
Assumption of Independence: Logistic regression assumes that the input variables are independent of each other. In real-world scenarios, variables can often be correlated, violating this assumption. Correlated variables can lead to biased and unreliable coefficient estimates.
Limited to Binary Classification: Logistic regression is designed for binary classification tasks, where the target variable has two classes. It cannot directly handle multi-class classification problems without extensions or modifications such as one-vs-rest or multinomial logistic regression.
Sensitivity to Outliers: Logistic regression is sensitive to outliers in the input data. Outliers can significantly affect the coefficients and the resulting decision boundary. Extreme values can skew the estimated probabilities and impact the model’s performance.

Decision Trees

Decision trees are versatile supervised learning algorithms used for classification and regression. They partition the feature space into regions based on feature values to make predictions.

Assumptions

Nonlinear relationships: Decision trees can capture nonlinear relationships between features and the target variable.
Feature relevance: Decision trees assume that the input features are relevant for predicting the target variable.

Interview-based Q&A

Q1. How does a decision tree decide which feature to split on?

Ans: A decision tree selects the feature that provides the best split based on criteria like Entropy or Gini impurity. It aims to maximize information gain or purity gain.

Q2. How do decision trees handle missing values in the dataset?

Ans: Decision trees handle missing values by either ignoring the samples with missing values or imputing the missing values based on the majority class or mean/median of the feature.

Q3. What is pruning in decision trees?

Ans: Pruning is a technique used to reduce the complexity and overfitting of decision trees by removing unnecessary branches or merging similar nodes based on validation set performance.

Q4. Can decision trees handle categorical variables?

Ans: Yes, decision trees can handle categorical variables by performing binary splits based on different categories. Each category forms a separate branch in the tree.

Q5. How can decision trees be sensitive to small changes in the data?

Ans: Decision trees can be sensitive to small changes in the data because a small change in the training data can lead to a different tree structure and potentially different predictions.

Drawbacks

Overfitting: Decision trees are prone to overfitting, especially when they are allowed to grow deep and complex. Overfitting occurs when the tree captures noise or irrelevant patterns in the training data, leading to poor generalization and reduced performance on unseen data. Techniques such as pruning and setting constraints on tree depth can help mitigate overfitting.
High Variance: Decision trees are known to have high variance. Small changes in the training data can result in significantly different tree structures, leading to high instability. Ensemble methods like random forests help to reduce variance by averaging predictions from multiple trees.
Sensitivity to Data Imbalance: Decision trees can be biased towards the majority class in imbalanced datasets. If one class dominates the training data, the tree may prioritize correctly classifying that class at the expense of the minority class. Techniques such as balancing class weights or resampling techniques can help address this issue.

End of Part 1… Check out the continuation in Part 2!

Mastering the Fundamentals of Machine Learning algorithms 📝 📚 — Part 1

Linear Regression:

Math Formulas

Assumptions

Interview-based Q&A

Logistic Regression:

Math Formulas

Assumptions

Interview-based Q&A

Draw backs

Decision Trees

Assumptions

Interview-based Q&A

Drawbacks

Written by Pavan Saish