Series of Blogs : Statistics for ML — Blog 1

Baby Steps towards Machine Learning with Statistics

Why Statistics plays a major role in making machines learn from data

--

Credit: Alex Knight

“A baby learns to crawl, walk and then run. We are in the crawling stage when it comes to applying machine learning.” ~Dave Waters

A few years ago, no one had an idea about Machine Learning(ML). We had been using ML in our daily lives even before the word got buzzed.

If you remember, every time Facebook asked you to tag yourself in a photo or your emails being separated as Spam without you interfering? These are some of the widely used applications of Machine learning.

Source: GIPHY

So now you might think, why is there a sudden boom in AI & Machine Learning today?

Comparing to past decades, data which we handle now is tremendous and everything is transformed as data even if you buy a product in Amazon. Companies today need Machine Learning to analyze this raw data to make important business decisions.

In this series, let’s assume you are a new-born baby and so, let’s make you understand Machine Learning from scratch.

“Machine learning provides computers with the ability to learn without being explicitly programmed”

But before discussing Machine Learning, we are going to look into Statistics first. So why statistics?

Why can’t I learn Machine Learning alone, I hate mathematics? If you have these questions, don’t panic.

Source: GIPHY

You can learn and apply Machine Learning directly without statistics but you’ll face issues in applying some of its algorithms as they work based on Statistics and Probability. We’ll learn about statistics in this series and how it’s being used in Machine Learning

1. So what is statistics?

Statistics is a method which helps us to collect & analyze the data, make decisions and predictions through that data.

Curious? No problem.

We’ll breakdown statistics through a scenario.

Assume that — You are the new Law and Order Minister of India & you want to know the State-wise crime rate cases against children in the country for the year 2018–19 using our historical crime data.

The dataset for this is attached below.

If you check the raw data, you won’t understand anything. It’s just data with numbers.

But, Statistical techniques can help us to find meaningful information from this data. It can also help us find answers to some unanswered questions like:

· Do low literacy states make more Crime against Children?

· What type of crimes is happening against children?

2. Categories of Statistics

Generally, Statistics is composed of two broad categories namely —

  1. Descriptive Statistics
  2. Inferential Statistics

Before discussing descriptive and inferential statistics, let’s understand few basic terminologies of statistics.

Population

As soon as we see the term ‘population’, the first thing that strikes our mind is a large group of people. Similarly, in statistics, population refers to the exhaustive collection of data of interest.

Example: Population refers to the number of people arrested for crime cases against children in each state.

Sample

It is not possible for us to access or take decisions from the whole data every time. Sometimes, the data we access can be too much for us to process. In such cases, we take small sample from the population in a way that it represent the characteristics of the entire population. The sample should not be biased so to get appropriate results.

Example: We might take samples based on crime type from each state for any particular year.

Descriptive Statistics

Descriptive Statistics is used to describe or summarize data in a meaningful way either through numerical calculations or by visual representations like graphs and tables.

For example, below is the comparison of data for the number of people arrested in each state in 2002 vs 2012 under Infanticide. As you can see, Uttar Pradesh is having the highest crime rates amongst all the states. And, it has decreased in 2012 as compared to 2002.

Image Generated by Author

Inferential Statistics

Inferential statistics use sample data to make conclusions about the population from which the sample was drawn. In other words, we can say that inferential statistics draw conclusions from the sample and generalize them for population.

For example, we can randomly select samples in each crime type from each state and try to generalize their results for the whole data.

Source: National Centre for Education Statistics

Hope this post gave you a proper introduction of Statistics. In the next part of the series, we will learn how to answer more questions such as:

What is the average number of crimes against children in India?

Against which crime, action has to be taken first?

Let’s have a deep dive into more Statistics topics in the next blog in this series of Medium blogs.

Thanks for reading this article 🤩

If you found this article useful — Give it all your 50 Claps👏 to show your encouragement, and help me keep Motivated everyday to write more!

Feel free to follow for more insights.

Let’s stay in touch on LinkedIn — ❤️to keep the conversation going!

See you again next time, have a great day ahead

--

--

Edward Praveen
The Hack Weekly — Data & AI Community

Data Engineer & Scientist | Innovating Data Solutions for Real-World Challenges | Datapreneur