Math for Data Science: Calculus. Step by Step Study Guide + Code. Part 1

5 min readNov 29, 2023

In the realm of data science, a foundational understanding of calculus serves as the compass guiding us through the intricate landscapes of numerical analysis.

Beyond the formulas and symbols lies a toolset that empowers data scientists to unravel patterns, make predictions, and extract meaningful insights. Join me as we delve into the essential calculus concepts that fuel the engine of data science.

Here is my article with complete and clear step by step roadmap how to become a data scientist in 2023:

Data Science Study Roadmap — 2023 Update

When I started, I wished for a roadmap that would tell me what I need to know, why it matters, and where to study it.

medium.com

The word Calculus comes from Latin meaning “small stone”,
So overall it is like understanding something by looking at small pieces.

Differential Calculus cuts something into small pieces to find how it changes.

Integral Calculus joins (integrates) the small pieces together to find how much there is.

The calculus for data science can be divided into the next parts:

Limits:

Intro to Limits
Limits and Infinity
Evaluating Limits
L’Hopital’s Rule
Continuous Functions

2. Derivatives (Differential Calculus) — is the “rate of change” or slope of a function.

The Delta Method
The Differentiation Equation
Derivative Notation
The Power Rule
The Constant Multiple Rule
The Sum Rule
The Product Rule
The Quotient Rule
The Chain Rule

3. Integral Calculus — can be used to find areas, volumes, central points and many useful things.

Integration Rules
Integration by Parts
Integration by Substitution
Definite Integrals
Integral Approximations

4. Differential Equations:

Introduction to Differential Equations
Separation of Variables
First Order Linear Differential Equations
Homogeneous Differential Equations/Functions
The Bernoulli Differential Equation
Second Order Differential Equations
The Method of Undetermined Coefficients
The Method of Variation of Parameters

In this article only the first 2 parts will be covered. As usual, here I will talk a bit of theory, then provide the links to the useful resources, so you can study by yourself, and next code implementation in Python, so you can play around with it.

Data-Science-Basics/Calculus at main · aussiekom/Data-Science-Basics

Code implementation of data science basics. Link to the related articles: - Data-Science-Basics/Calculus at main ·…

github.com

So What is Limits

In calculus, a limit is the value that a function approaches as its input approaches a certain value. Limits are used to describe the behaviour of functions at specific points, and they can be used to determine the behaviour of functions as they approach certain values.

For example, consider the function:

f(x) = (x² — 1) / (x — 1)

If we evaluate this function at x = 1, we get an undefined result, because the denominator becomes zero. However, if we look at the behaviour of the function as x approaches 1, we can see that the function approaches the value 2. In other words, the limit of the function as x approaches 1 is 2.

The notation used to represent limits is as follows: if f(x) approaches L as x approaches a, we write:

lim f(x) = L x->a

Here:

lim represents the limit
f(x) is the function being evaluated
L is the limit value
x is the input variable
a is the point at which the input variable is approaching.

The concepts of limits and continuity play a crucial role in data science, especially in machine learning. In this field, we frequently use these concepts to improve functions by determining the values of their parameters that minimise a loss function. This optimisation process relies on evaluating derivatives, which is where limits and continuity come into play.

Derivatives

Derivative is the rate of change of a function with respect to a variable.

The derivative is a way to show instantaneous rate of change which is the amount by which there is a change in the function at one given point. For functions that act on the real numbers, it is the slope of the tangent line at a point on a graph.

Derivatives of functions:

Linear functions: The derivative of linear functions such as mx + c with no higher terms is constant. When the dependent variable y directly takes x’ value (y =x), the slope of the line is 1 in all places. So, regardless of where the position is.
Power functions: Power functions behave according to their exponent and slope.
Exponential functions: An exponential is the form of ab f(x), where a and b are constants and f(x) are a function of x. The difference between an exponential and polynomial is that in a polynomial x is raised to some power, whereas in an exponential x is in the power.
Logarithmic functions: The derivative of logarithms is the reciprocal.
Trigonometric functions: The cosine function is the derivative of the sine function, while the derivative of cosine is negative sine.

Derivative Types:

The derivatives can be classified into different types based on their order such as first and second order derivatives.

First-Order Derivative:

The first order derivative describes the direction of the function whether the function is increasing or decreasing. It can be predicted as an instantaneous rate of change. It can also be determined from the slope of the tangent line.

Second-Order Derivative:

The second order derivative are used to get an idea of the shape of the graph for the given function. The functions can be classified in terms of concavity.

Importance of derivatives in data science:

The significance of derivatives in data science lies in their application to solve optimisation problems, particularly in machine learning. Optimisation algorithms like gradient descent leverage derivatives to determine whether to adjust weights upward or downward, aiming to enhance or diminish an objective function.

In the realm of data science, calculus is extensively employed for various models, and a prime example of its basic yet impactful role in machine learning is evident in the technique of Gradient Descent.