Mathematics for Deep Learning (Part 1)

3 min readFeb 15, 2022

I started my journey to learn Data Science and Machine Learning after my PhD. I have a PhD in Pure Mathematics and I become interested in learning Machine Learning due to the many and interesting applications.

Photo by Dan-Cristian Pădureț on Unsplash

As a pure mathematician I was in doubt about how to use my knowledge to understand the algorithms but I also think that it’s very important to not staying just in theory, so I started to do a lot of courses. In my journey I notice that I have a greater retention of knowledge by studying through books.

I tryed MOOC courses, tutorial videos, tutorial in blogs but I feel that books allow me progress at my own pace and cover a substantial part of the subject. Of course, it’s important don’t use just books because the field evolves fast and books aren’t that quick to keep up to date.

There are a few books on mathematics for machine learning (I’ll review that too). For starters I chose a book that goes straight to the point.

In this post I will do a review of the first section of the book “Hand’s-On Mathematics for Deep Learning” from Jay Dawani. The book was published by Packt. I will abord from the perspective of someone who already knows mathematics but I will comment what I think that someone who don’t know might feel.

In a nutshell, the first section of the book is just a review of definitions. Who already have some background in mathematics can review without difficult or even skip this section. People who have a little less background in mathematics can understand some ideas but this is not a book to learn this topics. Below I’ll detail each one of the first six chapters.

The first section have the following chapters:

Chapter 1: Linear Algebra
Chapter 2: Vector Calculus
Chapter 3: Probability and Statistics
Chapter 4: Optimization
Chapter 5: Graph Theory

Chapter 1: Linear Algebra

If you have some knowledge in Linear Algebra so you can skip this chapter without fear. Chapter 1 focuses primarily in the operations on matrices, vector spaces and eigenvalues and eigenvectors. The only things that maybe are unknown are the decomposition matrices shown: Singular Value Decomposition and Cholesky Decomposition; this decompositions are presented directly but no intuition about it is provided.

Chapter 2: Vector Calculus

Chapter 2 focuses on Single Variable Calculus, Multivariate Calculus and Vector Calculus, and again if you have some knowledge of Calculus you can skip this chapter. I think that must be difficult to someone that never studied Calculus to learn so much in just a chapter. Those who never seen Calculus before can understand the ideias but mastering Calculus takes some time and demands a lot of exercises been done. So if you’ve never seen Calculus before, don’t feel bad if don’t understand this chapter.

Chapter 3: Probability and Statistics

Chapter 3 focuses on probability and statistics, and again, for those who already studied these topics, the chapter can be skipped. Those who have never studied this may have difficult to understand multivariate models and Hypothesis Testing, especially for the multivariate part which is no so intuitive.

Chapter 4: Optimization

Chapter 4 focuses on presenting some basic definitions and ideias as constrained optimization, unconstrained optimitization, convex optimization and presents some methods of otmization as Newton’s Method, Least Squares and Lagrange Multipliers.

Chapter 5: Graphs

Chapter 5 presents some basic definitions about graphs as weighted graphs, directed graphs and adjacency matrix. Thist chapter is very illustrative and pleasant.

Next part…

I will post more in the future about the other sections of this book where the author intend to put all of this together in the Deep Learning context. Spoiler alert: the next section has five chapter on differents arquitectures for neural network. Stay tune and follow me to receive the next post.