Role of Mathematics in Machine Learning

Ritayan Dhara
11 min readMar 12, 2020

--

Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.

Why Worry About The Maths?

There are many reasons why the mathematics of Machine

Learning is important and I’ll highlight some of them below:

1. Selecting the right algorithm which includes giving considerations to accuracy, training time, model complexity, the number of parameters and number of features.

2. Choosing parameter settings and validation strategies.

3. Identifying underfitting and overfitting by understanding the Bias-Variance tradeoff.

4. Estimating the right confidence interval and uncertainty.

What Level of Maths Do You Need?

The main question when trying to understand an interdisciplinary field such as Machine Learning is the amount of maths necessary and the level of maths needed to understand these techniques. The answer to this question is multidimensional and depends on the level and interest of the individual. Research in mathematical formulations and theoretical advancement of Machine Learning is ongoing and some researchers are working on more advanced techniques. The following are believed to be the minimum level of mathematics needed to be a Machine Learning Scientist/Engineer and the importance of each mathematical concept.

· Linear Algebra

· Probability

· Statistics

· Calculus

Mathematical Concepts used in Machine Learning

1. Linear Algebra:

In ML, Linear Algebra comes up everywhere. Its concepts are a crucial prerequisite for understanding the theory behind Machine Learning. This will help you to make better decisions during a Machine Learning system’s development.

What is Linear Algebra?

Linear algebra is a field of mathematics that is universally agreed to be a prerequisite to a deeper understanding of machine learning.

Although linear algebra is a large field with many esoteric theories and findings, the nuts and bolts tools and notations taken from the field are practical for machine learning.

  • Linear algebra is the mathematics of data.
  • Linear algebra has had a marked impact on the field of statistics.
  • Linear algebra underlies many practical mathematical tools, such as Fourier series and computer graphics.

Reasons To Improve Your Linear Algebra

Of course, I don’t want you to stop at the minimum. I want you to go deeper.

If your need to know more and get better doesn’t motivate you down the path, here are five reasons that might give you that push.

  • Building Block: Let me state it again. Linear algebra is an absolute key to understanding the calculus and statistics you need in machine learning. Better linear algebra will lift your game across the board. Seriously.
  • Deeper Intuition: If you can understand machine learning methods at the level of vectors and matrices you will improve your intuition for how and when they work.
  • Get More From Algorithms: A deeper understanding of the algorithm and its constraints will allow you to customize its application and better understand the impact of tuning parameters on the results.
  • Implement Algorithms From Scratch: You require an understanding of linear algebra to implement machine learning algorithms from scratch. At the very least to read the algorithm descriptions and at best to effectively use the libraries that provide the vector and matrix operations.
  • Devise New Algorithms: The notation and tools of linear algebra can be used directly in environments like Octave and MATLAB allowing you to prototype modifications to existing algorithms and entirely new approaches very quickly.
This is done by looking for small changes in error iteration-to-iteration

Linear Algebra will feature heavily in your machine learning journey

Therefore, the topics you need to be familiar with:

  • Notation: Knowing the notation will let you read algorithm descriptions in papers, books and websites to get an idea of what is going on. Even if you use for-loops rather than matrix operations, at least you will be able to piece things together.
  • Operations: Working at the next level of abstraction in vectors and matrices can make things clearer. This can apply to descriptions, to code and even to thinking. How to do or apply simple operations like adding, multiplying, inverting, transposing, etc. matrices and vectors.
  • Matrix Factorization: If there was one deeper area I would recommend diving into over any other it would be matrix factorization, specifically matrix deposition methods like SVD and QR. The numerical precision of computers is limited and working with decomposed matrices allows sidestepping a lot of the overflow/underflow madness that can result.

2. Probability :

Most people have an intuitive understanding of degrees of probability and know that the probability of an event is some value between 0 and 1 which indicates how likely the event is to occur.

What is Probability?
…it’s about handling uncertainty.

Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world.

Handling uncertainty is typically described using everyday words like chance, luck, and risk.

Probability is a field of mathematics that gives us the language and tools to quantify the uncertainty of events and reason in a principled manner.

We can assign and quantify the likelihood of things we care about, such as outcomes, events, or numerical values.

Probability is a numerical description of how likely an event is to occur or how likely it is that a proposition is true.

Why is Probability Important to Machine Learning?

Machine learning is about developing predictive models from uncertain data. Uncertainty means working with imperfect or incomplete information. It would be fair to say that probability is required to effectively work through a machine learning predictive modelling project.

Uncertainty is fundamental to the field of machine learning, yet it is one of the aspects that causes the most difficulty for beginners, especially those coming from a developer background.

There are three main sources of uncertainty in machine learning, they are noisy data, incomplete coverage of the problem domain and imperfect models.

As machine learning practitioners, we must have an understanding of probability to manage the uncertainty we see in each project.

Probability is the Bedrock of Machine Learning

  • Classification models must predict the probability of class membership
  • Algorithms are designed using probability (e.g. Naive Bayes)
  • Learning algorithms will make decisions using probability (e.g. information gain)
  • Sub-fields of study are built on probability (e.g. Bayesian networks)
  • Algorithms are trained under probability frameworks (e.g. maximum likelihood)
  • Models are fit using probabilistic loss functions (e.g. log loss and cross-entropy)
  • Model hyperparameters are configured with probability (e.g. Bayesian optimization)
  • Probabilistic measures are used to evaluate model skill (e.g. brier score, ROC)
  • …the list could go on

Let’s consider an example which utilises Naive Bayes theorem:

Bayes Theorem
Bob

This is our friend Bob. Being his classmate, we think that he is an introvert guy who often keeps to himself. We believe that he doesn’t like making friends.

So, P(A) is called the prior. In this case, we will call it our assumption that Bob rarely likes to make new friends.

Now, he meets Ed in his college.

Ed

Unlike Bob, Ed is a laid back guy who is eager to make new friends.

P(B) in this case is the probability that Ed is friendly. On spending the day together, Bob realises that Ed and he are like two peas in a pod. As a result, they become friends.

Them becoming friends represents P(B|A):

Now, looking at the right-hand side and the example we established above, the numerator represents the probability that Bob was friendly P(A) and befriends Ed P(B|A). And all these values compute towards the result on the left-hand side, which is:

Most people who claim they know Bayes’ theorem would invariably get stuck here.

This new value is nothing but our belief about Bob. Or in other words, this is our new belief about Bob and the new value of P(A).

If I were to extract the nectar of this example, it would be something like this:

We made an assumption about Bob and the evidence we found was that he actually made a new friend!

3. Statistics:

Statistics is a field of mathematics that is universally agreed to be a prerequisite for a deeper understanding of machine learning.

What is Statistics?

Statistics is a subfield of mathematics. It refers to a collection of methods for working with data and using data to answer questions.

Statistics is the art of making numerical conjectures about puzzling questions.

Statistics is the art of making numerical conjectures about puzzling questions.

Although statistics is a large field with many esoteric theories and findings, the nuts and bolts tools and notations taken from the field are required for machine learning practitioners. With a solid foundation of what statistics is, it is possible to focus on just the good or relevant parts

Why Learn Statistics?

Raw observations alone are data, but they are not information or knowledge.

Data raises questions, such as:

  • What is the most common or expected observation?
  • What are the limits on the observations?
  • What does the data look like?

Although they appear simple, these questions must be answered to turn raw observations into information that we can use and share.

Beyond raw data, we may design experiments to collect observations. From these experimental results, we may have more sophisticated questions, such as:

  • What variables are most relevant?
  • What is the difference in an outcome between two experiments?
  • Are the differences real or the result of noise in the data?

Questions of this type are important. The results matter to the project, to stakeholders, and effective decision making. Statistical methods are required to find answers to the questions that we have about data.

We can see that to both understand the data used to train a machine learning model and to interpret the results of testing different machine learning models, that statistical methods are required.

This is just the tip of the iceberg as each step in a predictive modelling project will require the use of a statistical method.

Statistics and Machine Learning

Reasons why a machine learning practitioner should deepen their understanding of statistics.

1. Statistics in Data Preparation

Statistical methods are required in the preparation of train and test data for your machine learning model.

Statistics in data visualization

This includes techniques for:

  • Outlier detection.
  • Missing value imputation.
  • Data sampling.
  • Data scaling.
  • Variable encoding.

2. Statistics in Model Evaluation

Statistical methods are required when evaluating the skill of a machine learning model on data not seen during training.

This includes techniques for:

  • Data sampling.
  • Data resampling.
  • Experimental design.

3. Statistics in Model Selection

Statistical methods are required when selecting a final model or model configuration to use for a predictive modelling problem.

These include techniques for:

  • Checking for a significant difference between results.
  • Quantifying the size of the difference between results.

4. Statistics in Model Presentation

Statistical methods are required when presenting the skill of a final model to stakeholders.

This includes techniques for:

  • Summarizing the expected skill of the model on average.
  • Quantifying the expected variability of the skill of the model in practice.

5. Statistics in Prediction

Statistical methods are required when predicting with a finalized model on new data.

This includes techniques for:

  • Quantifying the expected variability for the prediction.

Some of the fundamental Statistical and Probability Theory needed for ML are Variance and Expectation, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment Generating Functions, Prior and Posterior and Sampling Methods.

4. Calculus:

Calculus is a set of tools for analyzing the relationship between functions and their inputs. In Multivariate Calculus, we can take a function with multiple inputs and determine the influence of each of them separately.

Differential Calculus

Single-variate and Multi-variate Calculus (also known as multivariable calculus) is the extension of calculus in one variable or more, to calculus with functions of several variables: the differentiation and integration of functions involving multiple variables, rather than just one.

In a machine learning model, our goal is to reduce the cost in our input data. the cost function is used to monitor the error in predictions of an ML model. So minimizing this means getting to the lowest error value possible or increasing the accuracy of the model. In short, We increase the accuracy by iterating over a training data set while tweaking the parameters(the weights and biases) of our model.

Let’s us consider we have a dataset of users with their marks in some of the subjects and their occupation. Our goal is to predict the occupation of the person with considering the marks of the person.

Dataset of subject marks of students

In this dataset, we have data on John and eve. With the reference data of john and eve, we have to predict the profession of Adam.

Now think of marks in the subject as a gradient and profession as the bottom target. You have to optimise your model so that the result it predicts at the bottom should be accurate. Using John’s and Eve’s data we will create gradient descent and tune our model such that if we enter the marks of john then it should predict the result of Doctor in the bottom of gradient and same for Eve. This is our trained model. Now if we give marks of a subject to our model then we can easily predict the profession.

In theory, this is it for gradient descent, but to calculate and model, gradient descent requires calculus and now we can see the importance of calculus in machine learning.

Some of the necessary topics include Differential and Integral Calculus, Partial Derivatives, Vector-Values Functions, Directional Gradient, Hessian, Jacobian, Laplacian and Lagrangian Distribution.

5. Misleneous:

This comprises of other Math topics not covered in the four major areas described above. They include Real and Complex Analysis (Sets and Sequences, Topology, Metric Spaces, Single-Valued and Continuous Functions, Limits), Information Theory (Entropy, Information Gain), Function Spaces and Manifolds.

End Notes

Mathematics for machine learning is an essential facet that is often overlooked or approached with the wrong perspective. In this article, a gentle overview of the role of Mathematics in Machine Learning with some pointers on why and where these concepts are required to build or improve a machine learning model.

I hope you have recognised the significance of mathematics in Machine Learning. If you have anything to state or have any issue just post in the comments below. I will be back with another interesting blog.

Till then…. Happy coding :)

And Don’t forget to clap clap clap…

--

--

Ritayan Dhara

Data Science enthusiast | aspiring Machine Learning engineer | Tech Enthusiast