Detailed Maths Topics and Their Direct Use In Machine Learning

Published in

EnjoyAlgorithms

13 min readFeb 13, 2024

Machine Learning is becoming popular because of the growing support of libraries and frameworks. It has become easier to find applications of AI and ML across all domains as people. However, by using AI via libraries and frameworks, one may not become a knower or professional in this field. It's good to have the support of coding frameworks to increase usability, but to land in the AI industry; we must know the logic behind those codes.

If we remove the frameworks' support in AI, it will become essential to know the mathematical details to write the line-wise logic and make machines learn the complex hidden patterns in the data. Specifically, we need an understanding of Probability, Statistics, Linear Algebra, Calculus, and Graphs. In this article, we will dive deeper into the need for maths and know precisely where they will be used in machine learning.

How does Maths help in becoming an expert in Machine Learning?

With the concrete knowledge of Mathematics involved, one can utilize the full potential of ML while building interesting applications across all possible domains. For example:

With the knowledge of inlined maths behind algorithms, one can select the best algorithm for the available dataset.
Mathematical knowledge behind the working of regularizers can help solve overfitting or high variance issues.
One can analyze more complex relationships among data features with the knowledge of Graphs.
One can design proper cost functions with the knowledge of the mathematics behind optimizers.

There can be many such examples where maths can help make one an expert in Machine Learning. But what precisely in Maths? How much maths would be required? Let's find the concrete answer to this and learn the use of all these topics with some examples.

What level of maths is required for ML?

The answer to this question is subjective and depends on the individual requirements. For example, if someone is doing research in machine learning, they may need to have a profound knowledge of maths as research requires researchers to be thoroughly in-depth. But, simultaneously, a person who wants to use the applications developed using AI may not even require any level of mathematics.

In this article, we will specifically discuss the minimum depth levels required for the person starting the ML journey or seeking a mathematical starting point to improve the strength in the ML domain.

What minimal topics in Maths are required for Machine Learning?

In Machine Learning, these five maths topics are very frequently used:

Linear Algebra
Probability
Statistics
Calculus
Graphs

In the later parts of these blogs, we will see what we need to know about these topics and where they will be used in Machine Learning.

Linear Algebra

Linear Algebra is the most used maths topic in ML, ranging from classical Machine Learning to the most recent and advanced LLMs. One can easily find the usability of Linear Algebra in all ML algorithms, like Linear Regression, SVM, KNN, Random Forest, or any other algorithm.

N-dimensional vector: We generally have a large number of samples for every feature present in our dataset. If we consider one feature vector having n data samples, it will be an n-dimensional vector. As the data is present everywhere in Machine Learning, we need to deal with n-dimensional vectors everywhere; hence, knowledge of the properties of n-dimensional vectors like dot-product, cross-product, addition, and subtraction is crucial.
Distance between Vectors: In ML, every feature is considered one dimension, and generally, the dataset contains numerous features. We first calculate the distance between the two features to observe their similarities. Hence, the knowledge of calculating the distance between two n-dimension vectors is crucial. The direct use-case of distance calculation can be found in two ML algorithms: K-Nearest Neighbors and K-Means.

Projection onto a plane: In SVM, we try to find the distance between a sample of n-dimensions and a plane existing in n-dimensions. In that case, we try to project that sample onto the hyperplane. Hence, the concepts of projection onto a plane and the knowledge of Hyperplanes are essential. The details of this can be found in the SVM blog.
Matrix: In the case of multi-dimensional features and a large number of parameters to learn, we take the help of matrices. In the case of Deep Learning, the number of parameters can reach Billions, and it becomes impossible to perform the process for each parameter. These parameters are stored as learnings in the form of Weight and Bias matrices. If one has used any ML application internally, that application uses these weight matrices to find the predictions. The Matrix concept makes Machine Learning and Deep Learning sustainable; otherwise, it would have taken ages to train the model and store the learnings. One can visit the blog How Machine Learns in Machine Learning to observe Matrices' basic needs.

Matrix multiplication, addition, subtraction, and transpose: Basic properties of Matrix, like multiplication, addition, and subtraction, are also present in all ML algorithms. We pass input data in the form of a matrix, multiply it by the weight matrix, and finally add it to the Bias matrix to form the final prediction. Hence, knowledge of these mathematical computations is essential to observe the transformation of the input features into the final predicted output.

Y_predicted = (Weight).Transpose * X_input + Bias

Orthogonality: The complete dataset can be considered a matrix where rows correspond to values and columns correspond to the features. To check whether one feature is independent of all other features, the easiest way is to check the orthogonality of the Matrix. If all columns are perpendicular to all others, the Matrix is orthogonal. These concepts are crucial and used in popular algorithms like Principal Component Analysis (PCA) and Support Vector Machines (SVM).
EigenValues and EigenVectors: As the dimensions in the dataset were increasing rapidly, there was a need to form dimensionality reduction techniques to observe the dataset and plot them in a 3D plane. Algorithms like PCA use the concept of Eigen Values and Eigen Vectors to check which features contribute more information to the data and subsequently preserve them in the final dataset as they are essential. To know these algorithms, we need to know EigenValues and EigenVector decompositions of matrices.
Singular Value Decomposition: Matrix sizes are becoming huge these days with the advancement of computing power. In that case, we need a more straightforward way to extract the most essential information in the Matrix. That's where SVD comes into the picture in ML, where we factorize the Matrix into three different matrices. It is beneficial in the case of building applications involving a very high number of dimensions like Image Compression or even t-SNE Algorithm.

Probability and Probability Distribution Functions

Probability is among the most used topics in almost all computer science fields. In Machine Learning, a complete algorithm, Naive Bayes Algorithm, is primarily dedicated to Probability to understand how valuable maths can be in understanding the workings of ML thoroughly. Also, the predictions given by models in the case of classification algorithms are presented as probabilities of the various classes present in the dataset. Hence, understanding the workings of Probability is of utmost necessity. These are a few topics that one needs to understand under Probability clearly:

Simple Probability: In ML classification problems, we can easily find the usability of simple probability theories, like which class will be the dominant. Probability tells us about the chances of the presence/occurrence of any class based on the given input data. Along with the simple Probability, one should also know the basic rules like the sum or product rules of probabilities to gain more firm knowledge. The softmax activation function in ANNs uses simple probability theory to formulate the output in terms of probabilities and predict the maximum probable class.
Conditional Probability and Bayes Theorem: Conditional Probability states the Probability of an event's occurrence based on another event's occurrence. This conditional Probability is used in Baye's theorem as well. These algorithms are extremely important in Machine learning as they hint at how input features can affect the predictions from an ML algorithm. The Naive Bayes Classifier algorithm is entirely based on the Bayes theorem itself, so one can sense the importance of these topics in ML.

Random Variables: Continuous and Discrete random variables differ from algebraic variables and represent the outcomes of the random experiments. When assigning some initial values to the parameters in Machine Learning, we use random variables, which act as a starting point for our ML training process. Hence, the knowledge of random variables is required in ML or Data Science engineers/Scientists.
Probability Distribution: In the case of classification problems, we can not use loss functions like MSE and MAE. Hence, we used to match the PDFs of the predicted and actual. One can learn about this PDF matching in these two blogs: Classification and Regression in Machine Learning and Loss or Cost Functions in Machine Learning.

Classification problems in machine learning

Continuous and Discrete Distribution: Sometimes, the probability distribution does not follow a continuous nature, and hence, we need discrete density functions

Popular Distribution Functions: Gaussian, Skewed, non-skewed: Probability Distribution Functions, popularly known as PDFs, are essential to see the data distribution as algorithms like Linear regression work well with the input data from Gaussian distribution, which is a type of PDF.
Maximum Likelihood Estimation (MLE): The logistic regression algorithm uses a particular type of cost function based on MLE methods.

Statistics

Machine Learning is another form of statistics where we estimate the data summary. This summary can be anything ranging from finding the average or mean out of data to finding the summarization in a more complex form by predicting hidden patterns. For example, suppose we give you the salaries of 10 persons working in a company in the same position and ask you to predict the salary for the 11th person; what would be your guess? Average!! Right? Yes. This average can be far away from the actual figure, and using ML, we try to reduce this gap between predicted and Actual values. Hence, ML is another form of statistics only.

The topics we can focus on in Statistics are mainly Data Summarization techniques like

Mean: Mean is the average of the data values. The use of mean in Machine Learning can be found in the case of normalizing the features, calculating R² values, and many more. The reference to the blogs for normalization and evaluation metrics for regression models where the mean is used can be found in the respective links.
Median: The median represents the middle element in the data in ascending or descending order. It is beneficial when dividing data samples into equal intervals or inter-quartile ranges (IQR). One of the direct use cases can be found in the Box plot used for Data Analysis.
Mode: Mode is the most frequent number present in the data sample. It summarizes the data regarding highly occurring elements, which can be used in machine learning to find which sample dominates the dataset. In the case of classification problems, if the mode suggests that one class is dominant, then ML models may always predict that class.

Calculus

Calculus is also one of the most crucial math topics in Machine Learning. Most of its use can be found when training ML models, as it is a constant part of almost all optimization algorithms. For example, in the Gradient Descent algorithm, we use derivatives of the cost function to sense the direction (+ve or -ve) for updating the parameters. Let's list down some concepts under calculus that would be required.

Basics of Function: Functions are a core part of machine learning that is even present in the definition of ML. In ML, we try to map a function between input and output data. For example, we try to fit polynomials as per the mentioned degree in linear or polynomial regression algorithms.

Degree 1 : θ1*X + θ0

Degree 2: θ2*X^2 + θ1*X + θ0

Continuous and Discrete Functions: In Machine Learning, we generally use derivatives over the functions. To check whether a function is differentiable, we need to check its continuity. Hence, the knowledge of properties of continuous and discrete functions helps decide whether a function is suitable for our use case. For example, continuity is a crucial property for any activation function in Artificial Neural Networks. Please check the blog on activation functions for more details.
Basics of Differentiation: Differentiation of functions is critical in Machine Learning as we need it in all the algorithms as we estimate the direction to increase/decrease the parameter values based on the derivative of the cost function. Today, Python libraries can provide these values instantly, but knowing these mathematical concepts can help design or debug complex methodologies.
Composite function & Chain rule: While doing the backpropagation in neural networks, we need to use the chain rule to update all the parameters. This happens because the cost function at the output layer becomes a composite function, and the only way to find the derivative of the cost function is to use the chain rule. You can find the use of chain rule in ANNs here.
Partial derivatives: In ML, there can be more than one parameter that machines need to learn. But to check the effect of one parameter on the overall cost function, we need to consider the partial derivative of the cost function w.r.t. all the individual parameters. In partial derivative, we only consider one parameter as a variable, and the rest are kept constant. A more detailed use of partial derivatives can be found in these blogs: Gradient Descent and Backpropagation in ANN.
Fourier Series: A Fourier series is an expansion of a periodic function in terms of sine and cosine functions. We can find its direct use in data analysis and plotting Andrews curve.

Matrix Differentiation: In Deep Learning models, the number of parameters can reach up to billions. In that case, it will become impossible to calculate the partial derivatives with respect to all the individual parameters. Hence, we use the matrix-based differentiation to compute the derivative layer-wise, not for each element.

Graph

The concept of graphs is fundamental in Machine Learning. When training ML models, we plot curves to check whether loss decreases in subsequent epochs/iterations. We also use graphical representations to showcase the different types of analysis from the data to extract more meaningful information. Hence, graphs are another concept that is present in all stages of the ML pipeline. If we list down some topics under the graph, then:

Linear Function and Equations: Knowledge about the linear equations and functions is crucial when we learn ML, as most of the algorithms talk about the slopes/gradients and the equations like θ1*X + θ0 where we need to know values for θ1 and θ0.

Non-linear and discrete graph formation: In the case of Polynomial fitting on the dataset, or even piece-wise learning in Machine Learning, we can find the use-case of non-linear and discrete functions. To check the fineness of the fitting, we need to plot the curves to check whether the predicted values of Y overlap the Actual values.
Parabolic equations: Cost functions like MSE (Mean Squared Error) are specially designed to be in parabolic form as finding minima becomes straightforward. Another advantage of a parabolic equation is that it can have only one minimum, so optimizers will easily find parameters corresponding to that minimum. With this example, we can sense how useful it can be to know graphs.

MSE = Σ (Y' - Y)^2, # Y' = Predicted Value of Y and Y is the actual value of Y

Higher-order polynomials and exponentials: We can find the higher-order Polynomial fitting on the dataset in numerous places. Even exponentials can be found in the inlined working of the Sigmoid activation function and the basic ML algorithm Logistic Regression. To understand these algorithms thoroughly, we must understand how the logit functions work.
Functions like Tanh, Sigmoid, and custom graphical forms: The knowledge of curves like Tanh, Sigmoid, Relu, and other exponential curves can help decide the proper activation function in ANNs. An activation function must follow some properties, such as being bounded and zero-centered. With the graphical knowledge, we can easily sense that a function holds these properties.

Do we need to learn all these concepts first and then start the ML journey?

While reading Machine Learning, learners will encounter these mathematical concepts very frequently. It is not a prerequisite for starting Machine Learning. Generally, a learner starts with ML by being a user of ML and then dives deeper into the domain of ML by the amount of interest. Hence, mathematics is definitely not a prerequisite in ML, but if learners want to build a career in this field, they should know the bare minimum concepts mentioned in this blog.

Indeed, all the knowledge is not required to take your first step in the journey of Machine Learning, but it is advisable to learn all these concepts in parallel and identify their usage in ML to explore the full potential of this field.

Conclusion

The Knowledge of Mathematics can transform simple ML engineers into advanced ones. It enables them to understand the inlined principle of ML algorithms and even modify them as per their needs. In this article, we have summarized what exact topics in math are required for a machine learning beginner to excel in this domain. We hope you enjoyed the article.

Enjoy Learning!