Math foundations for deep learning

Andrea Santomauro
CodeX
Published in
5 min readJun 16, 2022

If you want to masterate advanced topics in the field of deep learning (DL), you have to know the basic math beyond the model you’ll use. That’s foundamental if you want to really understand what’s going on when you are training or testing your models: why the models overfit/underfit, how to increase performance, how to prepare your data and so on.

That’s why I decided to start writing some posts about math and DL. I don’t know how far it will go, but if you like this tutorial and you want to deepen some topics, just send me an email, or contact me on LinkedIn (you find contatcs in my bio). Ok, let’s start with practical stuff.

Functions

Let’s take the definition of function from Wikipedia:

In mathematics, a function from a set X to a set Y assigns to each element of X exactly one element of Y. The set X is called the domain of the function and the set Y is called the codomain of the function.

We can see this definition graphically, as follow:

A function that associates any of the four colored shapes to its color.

As far as now, the concept of function should be clear enough. Let’s see an example to clarify better the idea.

We have the function f taking as input an integer x and it’s defined as follow f(x) = x + 1. It means that the function f, taken an integer x as the input, it will return the next integer as output. For example, if x=5 then f(5) = 5+1 = 6. Graphically:

The chart of f(x) = x+1

I’m not going to deepen this topic because I know everyone is going to read this article is already familiar with function. It’s just a brief reminder for you. Anyway, if you are struggling with functions just send me a message or an e-mail.

Derivative

Derivative is another extremely important concept for understanding deep learning and how optimization algorithms work. I know the most of you are already familiar with this concept but lets briefly see what derivative is.

At a high level the derivative of a function at a point is the “rate of change” of the output of the function with respect to its input at that point. We can see it graphically.

If you are more comfortable with the mathematical definition, here you go:

Derivative in his math form

This limit can be approximated numerically by setting a very small value for delta, such as 0.0001 and then we can compute the derivative. The derivative’s result is a number that is the the slope of the tangent line at a point (as we can see in the gif above).

Nested Functions

Another important concept to understand how neural networks work, is to understand that function can be nested in order to create composite functions. Nesting means that if we have two function, f₁ and f₂, the output of one of the functions becomes the input to the next one. Mathematically:

A nested function is yet another function, so we can compute derivative and being able to compute derivatives of nested functions is essential for neural network. Let’s see how we can compute derivative of nested functions.

The Chain Rule

The chain rule is a mathematical theorem that lets us compute derivatives of compo‐ site functions. Deep learning models are, mathematically, composite functions, and reasoning about their derivatives is essential to training them [1].

In order to compute the derivative for the above compiste function y, for any given value x we have:

where the du means we are deriving wrt u.

Let’s make an example in order to clarify any doubts. Consider the composite function f(x) = (x²+1)³. We can threat this function as a nested function, defining u(x) = x²+1 and f(x) = u(x)³. Then we can apply the chain rule as follow:

Notice that every step is a simple calculus step and it’s really easy to implement this in a python routine. If you are wondering about that: yes, chain rule can be applied to more complex nested functions (e.g. 3,4 or even more nested functions).

Functions with multiple inputs

Most of the time, in deep learning, we have to face functions with multiple inputs and we can see this inputs as the features describing entries in a dataset.

Let’s see a small example with two inputs: f(x,y) = s(x + y) where sis another continuous function. As you can imagine, we are now interested into deriving this function. But how can we do? We can simply apply the chain rule and derive wrt x and y separately.

We can see the previous function as a nested function f(x,y) = s(a(x,y)), where the function a does the addition x+y. Then we have:

and deriving wrt y is the same. These are called partial derivatives.

Conclusion

In this brief article we’ve seen the basic mathematical stuffs useful to understand what’s behind deep learning and neural networks. In the next article we’ll see how to implement this concepts in python using pandas library and then we’ll see the vectorial form and the reasons to use vectors and matrices.

Hope you enjoyed this article 💪🏻

Bibliography

[1] Deep Learning from Scratch, Seth Weidman

--

--