Statistics in ml

A guide to Statistics Concepts before you start with Machine Learning

Priyanka gupta
The Pythoneers
3 min readDec 14, 2020

--

Photo by Crissy Jarvis on Unsplash

Learning is a never-ending process. Now, why would I say that in the very beginning?? Because Machine Learning is an emerging field where you can find endless information on any topic. So it’s always the best practice to complete one topic from one source and then hops on to some other source.

Machine Learning is the field that provides computers the ability to make predictions without being explicitly programmed. Many research and development projects today involve machine learning technologies which make the work simplified for them.

Today, machine learning is needed by almost every company and thus it has created many new opportunities for individuals in this field. Even though there are many opportunities but still individuals having proper knowledge even about the basics of machine learning are not enough. This is because instead of learning about the fundamentals of ML from scratch, students directly start with coding and just importing libraries.

Now, what if you are told that ML and statistics are closely related? Yes, they indeed are. And thus one should study statistics. In this article, some resources for statistics have been mentioned which can be helpful when starting with Machine Learning.

STATISTICS CONCEPTS

  1. Basic concepts of Probability: Probability is the branch of mathematics that tells how likely an event is to occur. Probability is required when dealing with algorithms like Decision Tree, Random forest etc.

Probability Video:

2. Mean,Mode Median: When data is fed to a model in ML, it’s mostly in numeric form. This is because it’s easy to deal with numbers. Now when data analysis is done, properties like mean, mode, median are required to understand the data in a better way.

Mean, mode, median Video:

3. Random Variables: Random variables are variables whose value depends upon random events. These are required by neural network for prediction of possible outcomes.

Random variables video:

4. Variance and Covariance: Before making any models for the data, it’s always recommended to go through a data analysis first and then apply the model. Understanding the variance (the spread of data around mean) and covariance (relationship between two variables) is an important aspect of Data Analysis.

Variance and covariance video:

5. Gaussian or Normal Distribution: A probability distribution is a function that describes all the values that a random variable can attain within a given range. Again, the probability distributions play an important role in analysing the data.

Probability Distribution Video:

--

--

Priyanka gupta
The Pythoneers

Product Analyst @ Amex| Lead MLE @Omdena | Exploring NLP