Know the math

Nonso
Machine Intelligence Team
4 min readFeb 7, 2019

what goes on in the background?

Essential mathematical functions you should know as a data scientist.

Knowing the essentials of mathematics in data science is very important. it gives you an insight of what exactly goes on while an algorithm is working or what techniques are needed to perform tasks. The science behind data science is simply what mathematical functions and techniques are being applied when operations go on.

Sometimes, as a data scientist (or even as a junior analyst on the team), you have to learn those foundation mathematics by heart to use or apply the techniques properly. Other times you can just get by using an API or the out-of-box algorithm.

However, having a solid understanding of the math behind the cool algorithm you are using to create meaningful product recommendation for your users, will never hurt you. More often than not, it should give you an edge among your peers and make you more confident. It always pays to know the machinery under the hood (even at a high level) than being just the guy behind the wheel with no knowledge about the car. I will be running through certain functions every data scientist should know.

Statistics

Statistics is widely used in data analysis operations and it is important we get to know it to the bits. Some statistical operations needed include:

1. mean
2. median
3. standard deviation
4. probability distribution
5. variance
6. random number generation
7. dimensional reduction

Mean, Median and Mode, are used for simple pre-processing operations. like checking the average number of data in a column, the mid value, maximum and minimum value, standard deviation(which measures the deviation of data from the mean), variance(how spread are the data) etc. Statistics is also relevant for visualization which gives you an insight of what the data looks like. visualizations tools like:

1. bar chart
2.box-plot
3. pie chart
4. histogram

are all built on statistics. So we can then say statistics is used for analysis and data visualization.

Linear Algebra

Techniques like matrices, vector and scalar multiplication, inner and outer products, special matrix. Are all important techniques based on linear algebra. Instances of its application in machine learning are:

1. Friend suggestions on Facebook. 
2. Song recommendation in Spotify.
3. Transferring your selfie using Deep Transfer learning.

And other matching techniques make use of matrix operations. Dimensional reduction also has a little application of special matrices going on in the background as well.

Here are the essential topics under linear algebra to learn,

Basic properties of matrix and vectors — scalar multiplication, linear transformation, transpose, conjugate, rank, determinant, Inner and outer products, matrix multiplication rule and various algorithms, matrix inverse, Special matrices — square matrix, identity matrix, triangular matrix, idea about sparse and dense matrix, unit vectors, symmetric matrix, Hermitian, skew-Hermitian and unitary matrices, Matrix factorization concept/LU decomposition, Gaussian/Gauss-Jordan elimination, solving Ax=b linear system of equation, Vector space, basis, span, orthogonality, orthonormality, linear least square, Eigenvalues, eigenvectors, and diagonalization, singular value decomposition (SVD).

Calculus

I know we might have a certain fear for calculus, but trust me you need it for data science and machine learning. It is used extensively to formulate functions used to train algorithms to reach their objectives. Here are some examples for you to pay attention to:

1. Functions of single variable, limit, continuity and differentiation.
2. Mean value theorems, indeterminate forms and L’Hospital rule,
3. Maxima and minima.
4. Product and chain rule.
5. Taylor’s series.
6. infinite series, summation/integration concepts.
7. Fundamental and mean value-theorems of integral calculus, evaluation of definite and improper integrals.

They come in play in algorithms like Logistic Regression which uses a technique called gradient descent.

Discrete Math.

As easy as it may seem, you need to master discrete math. we all do.

Topics to pay attention to are:

1. Sets, subsets, power sets
2. Counting functions, combinatorics,
3. Basic Proof Techniques — induction, proof by contradiction.
4. Basics of inductive, deductive, and propositional logic.
5. Basic data structures- stacks, queues, graphs, arrays, hash tables, trees.
6. Graph properties — connected components, degree, maximum flow/minimum cut concepts, graph coloring.
7. Recurrence relations and equations.
8. Growth of functions and O(n) notation concept.

How can it be applied, you might ask?:

Certain Search algorithms and understanding the time and space complexity(big O(notation)) of algorithms.

I attached some links you can access to learn more about these concepts and improve your mathematics.

statistics

calculus

Linear algebra

discrete math

We appreciate you taking out time to read our very first article. until next time happy coding :)

--

--