Deep Learning from Scratch Episode:1

Derivatives! Don’t be scared

Divya Chandana
The AI Guide
3 min readApr 4, 2022

--

It’s useful!

Differential Calculus is crucial to the operation of Machine Learning algorithms. It offers us with correct mathematical solutions for optimizing complex objective functions and multidimensional inputs.

Derivate is nothing but anything that changes over time

Introduction

Composite functions, or functions generated by numerous functions, abound in deep learning models. Understanding the chain rule, which computes the derivatives of composite functions in learning models, is critical.

Objective

The goal is to study chain rule in depth using mathematical functions, as well as to code the chain rule from start and examine the results in graphs by comparing normal functions vs derivatives to observe how the trend changes.

Chain Rule

The Chain Rule is a mathematical theorem that calculates the derivative of composite functions, or functions created by nesting one or more functions.

Figure 1 Nested Functions

consider f1 and f2 are two functions in which f1 receives x as an input and produces an output that is then passed on to f2 to produce the final output y. f1f2 is the mathematical representation of the given diagram.

The outcome of applying derivatives to the composite function is as follows:

Here, x is the input, and u is a dummy variable that represents the function’s input.

In this situation, the functions f1,f2 take one input and produce one output, and the derivative notation is u. When functions have multiple inputs, such as x,y, we use df/dx and df/dy, respectively.

The composite function’s derivative is the product of the individual functions’ derivatives.

Code

When applying the chain rule, two functions must be present that take one input and produce one output; the two functions f1 and f2 are square and sigmoid, respectively.

f2(f1(x)) is an implementing function that computes the mathematical equivalent of f1f2.

computing chain rule for the function f1 and f2

(f2(f1(x)))’ = f2'(f1(x)) * f1'(x)

Creating evenly spaced values with .01 from -5 to +5

calculating the output values for f(x) = sigmoid(square(x)) and f(x) = square(sigmoid(x))

fig left : f(x) = sigmoid(square(x)) | fig right : f(x) = square(sigmoid(x))

When the general functions movement is upward-sloping, the derivative is also positive, as shown in the graph above. When the function is applied to flat values, the derivative appears to be zero. The derivative is negative when the function is downward sloping.

We created simple functions with only one input and output. Deep Learning Models, on the other hand, are typically composed of a long chain of nested functions containing differential functions. They result in more difficult mathematical calculations.

Try it yourself

Apply the same chain rule for a bit longer function f1f2f3. Post the derivative for f1f2f3 in the comment section below.

References:

--

--