Basic Mathematics for Data Science

Kolla Kishan
Analytics Vidhya
Published in
4 min readAug 26, 2019
Photo by luke-chesser-unsplash

The Highest form of pure thought is in Mathematics -Plato

Motivation

Like some of us, even I 😔suffered from a lack of guidance in my initial days of learning 📊data science. Although I had covered required mathematics in my Undergraduate degree, It became irrelevant during the four years working with a different business which took me away from the daily use of complex mathematics.

Soon after the first week of classes, I understood that I needed to brush up my mathematical concepts and so I did. But, let me tell you something “IT WAS HARD”, But Hey!, what’s success without the struggle right 🤗.

It took me a solid 2 weeks to get back in to track with the syllabus. While during the class, I realized many students faced the same problem. So, this following story is elaborate on the required mathematics for pursuing data science.

Probability

It is a measure of the likelihood of an event to occur. At a given point there can only be one outcome with a probability closer to ‘0’ or’1' (on a number line) depending upon the circumstances.

Probability is a really old concept, it is widely used in modern Engineering and Business to make intelligent systems that could think like a human. It majorly helps in making decisions for a given situation or perhaps helps in predicting the future.

The vast applications of probability in day to day life is very extensive but so settle that we some times don’t appreciate it. Life is all about choices and this is something that deals with emotions in humans. But, whereas in order for the modern machinery to make the right decision, it has to calculate the outcome of all different decisions and choose the best fit.

For example, let us imagine a tossed unbiased coin, probability of the outcome of this even is equal for both heads and tails. So the probability of such event is 1/2 where the numerator is desired outcomes and the denominator is the total number of outcomes.

Some popular resources to learn probability are:

Probability — The Science of Uncertainty and Data by MIT

Data Science: Probability by Harvard University

Probability For Dummies Cheat Sheet

Statistics

It is the method that involves the gathering, planning, publicizing, interpreting, analyzing and presenting data.

The above methods are performed in a sequential manner in the real world scenario. However, statistics is most frequently used in scientific, Industrial and public problem solving.

It trades with every phase of data, including the planning of data gathering in terms of the scheme of studies and researches.

There are two principal statistical approaches used in data analytics:

  1. Descriptive statistics are short narratives that abstract a given data set, which can be either a description of the entire or a sample of a group.

Descriptive statistics are further classified into

Measures of central tendency include the mean, median, and mode.

Measures of variability include the variance, standard deviation, minimum & maximum variables, and the kurtosis(Usually talks about the outliers and it is all about the tails in the graph not the peaks) and skewness .

2)Inferential Statistics uses a stochastic sample of data from a group to describe and make assumptions about the group.

Two forms of inferential statistics:

Estimating parameters takes a statistic from your example data and uses it to state and summarize about the population mean.

Hypothesis tests use sample data to explain the research proposals. For example, you might be interested to know the effective performance of a new Diabetic drug in the market or whether schools with school busses perform better or without.

Some popular resources to learn statistics are:

Fundamentals of Statistics by MIT

Data Science: Inference and Modeling by Harvard University

Introductory Statistics : Sample Survey and Instruments for Statistical Inference by SNUx

Statistics for Data Science and Business Analysis by Udemy

Ultimately, Probability and Statistics are the backbones to any data science problem. If you are not a mathematics major, the concepts may seem a little intuitive but if you have already known these concepts then it’s just applying your theoretical knowledge to practice. Since learning data science is not exceptional you might want to give it some time to seep in all the required concepts before you perform your first experiment.

Check out the new story:

Thanks for reading! Feel free to write a response.😊

--

--

Kolla Kishan
Analytics Vidhya

Business Analyst | Independent Researcher | ML | A Creative Mind