Math, Stats and NLP for Machine Learning: As Fast As Possible

Souman Roy
MetaInsights
Published in
5 min readFeb 9, 2018

--

Machine Learning: Episode 0

Have you heard people talking about machine learning but only have some random thoughts about what that means? Are you a student who wants to cope up with this technology? Are you intimidated by the fact that there are lots of machine learning resources online?

Don’t worry we present you Machine Learning Zero To One, A series will make you from Beginner to Expert Level. Let’s get on Board!

This guide is for anyone who is curious about machine learning but has no idea where to start. I imagine there are a lot of people who tried for searching lots of resources from the internet and got frustrated and gave up wishing someone would just give them intro where to get started.
The goal of this series is to teach you machine learning right from the beginner perspective.

Ok, Why Math is Necessary ?

Machine learning is built on fundamental principles of mathematics like Calculus, Linear Algebra, Probability, Statistics, and Optimisation. This article aims to help you learn some essential foundational concepts and provides a hands-on approach by using python programming language on Jupiter Notebook.

source: http://edx.org

Step 1 Learn : Linear Algebra

Linear algebra is a way to frame optimisation algorithms within a computer — it’s basically solving linear systems of constraints.

Concepts you need to know in Linear Algebra.

#1 Rank of a Matrix. 
#2 Matrix Vector products.
#3 Column Spaces and Null Spaces of a matrix.
#4 Eigen Values and Eigen Vectors.
#5 SVD factorization of a matrix.

Here a quick cheat sheet to understand Linear Algebra concept Faster:

https://minireference.com/static/tutorials/linear_algebra_in_4_pages.pdf

Step 2 Learn: Probability Theory & Statistics

The branch of mathematics that deals with quantities having random distributions.

Concepts you need to know in Probability Theory & Statistics.

Probability Theory: 
#1 Counting and Combinatorial methods.
#2 Bayes’ Theorem.
#3 Random Variables.
#4 Expection
#5 Variance
#6 Conditional and Joint Distributions.
#7 Moment Generating Functions.
#8 Exponential Family of Distributions
Statistics:
#1 Maximum Likelihood Estimation
#2 MAP
#3 Prior and Posterior
#4 Sampling methods
#5 Gibbs
#6 Mean, Mode, Medium, Variance

Here a quick cheat sheet to understand Probability Theory & Statistics. concept Faster:

Source: https://static1.squarespace.com/static/54bf3241e4b0f0d81bf7ff36/t/55e9494fe4b011aed10e48e5/1441352015658/probability_cheatsheet.pdf
Source: http://web.mit.edu/~csvoss/Public/usabo/stats_handout.pdf

Step 3 Learn: Multi-variable Calculus

Calculus is classically the study of the relationship between variables and their rates of change. However, this is not what we use calculus for. We use differential calculus as a method for finding extrema of functions; we use integral calculus as a method for probabilistic modeling.

Concepts you need to know in Multi-variable Calculus

#1 Vector-valued functions
#2 Partial-Derivatives
#3 Gradient
#4 Directional Gradient
#5 Hessian
#6 Jacobian
#7 Laplacian
#8 Lagrange Multipliers
Source: http://tutorial.math.lamar.edu/getfile.aspx?file=B,41,N

Step 4 A Little bit of : Information Theory

This branch of applied mathematics deals with studying how to quantify information.

Concepts you need to know in Multi-variable Calculus

#1 Entropy
#2 Mutual Information
#3 Information Gain
#4 KL Divergence
Source :http://tuvalu.santafe.edu/~simon/cheat_sheet_info.pdf

Step 5 Know about : NLP a.k.a Natural Language Processing.

Defined as the automatic manipulation of natural language, like speech and text, by software.

As machine learning practitioners interested in working with text data, we are concerned with the tools and methods from the field of Natural Language Processing.

We will take Natural Language Processing — or NLP for short –in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles. At the other extreme, NLP involves “understanding” complete human utterances, at least to the extent of being able to give useful responses to them.

— Page ix, Natural Language Processing with Python, 2009.

Natural language processing (NLP) is a collective term referring to automatic computational processing of human languages. This includes both algorithms that take human-produced text as input, and algorithms that produce natural looking text as outputs.

— Page xvii, Neural Network Methods in Natural Language Processing, 2017.

The aim of a linguistic science is to be able to characterize and explain the multitude of linguistic observations circling around us, in conversations, writing, and other media. Part of that has to do with the cognitive size of how humans acquire, produce and understand language, part of it has to do with understanding the relationship between linguistic utterances and the world, and part of it has to do with understand the linguistic structures by which language communicates.

— Page 3, Foundations of Statistical Natural Language Processing, 1999.

That’s it this much mathematical, statistical and NLP understanding you need. My advice would be if you want to get into deep explorations of Machine Learning Try to learn at least some of the mention concepts.

But Now It’s Time to Celebrate

You might be wondering where is machine learning stuff its just math, Well this article has given you the list of Mathematical concept you need to know for getting started with Machine Learning or AI. Stay tuned for next episode I will discuss how to Implement this mathematical concept into coding ability using Python as the programming language and understand python ecosystem.

Soon I will share a github repo for this.

Please share your feedback/Comments/opinions in the comment section. Keep Learning.

--

--

Souman Roy
MetaInsights

Business Intelligence practitioner | Problem Solver | Founder MetaInsights, Solve for India