Things that a Data Science Pursuer shouldn’t forget. Part-1

Aug 18, 2022


As I said before in “10 Reasons Why a Data Scientist Beginner should learn MS Excel immediately”, Data Science is not just one of the skill that you can study for a month and put it in your resume. No, Its not. There are exponential Research's going on out there about Machine Learning and Statistics. Of course you know what to do while starting, but human brain is a mystery. It often misses a crucial detail that has so much importance in out life. Don’t believe me? Ask yourself, did you ever forget where your keys are? Now you get my point? What I mentioned below will completely change your view on How to study Data Science. Before we go, if you haven't my previous post, do read it. Trust me it will be useful.

Ok, Things you shouldn’t forget when studying Data Science.

1. Excelling in Excel.

If you checked my previous post about the excel, you would understand Why you should never forget to learn about it. I know, I know. Its boring. But trust me, once I visualized and interpreted a Linear Regression model on Excel. Yes, you’ve read it right. I made a regression in Excel.

People often have dilemma that it is a waste of time. Ok, let me put it in this way. Data Science on large scale is mostly about understanding and Visualizing Data and making a meaningful Statistical Model from it.

Huh… Visualizing a given data is not as hard as you think. Trust me, Its not as compared to Understanding the data. People just do everything they have been taught but still can’t process, why they are doing what they are doing. Its not their fault you know. How the data behaves is completely unpredictable. Understanding how the data works in Excel will help you more than anything. Ok, I think that's enough praise for excel. If you are still not convinced CLICK HERE.

2. Statistics

Ok I think at this point lost 99% of my Viewers. If you are still reading this, please keep up the good work. You are in the 1% people who are really interested. Ok, now Statistics. Do you think it is important for Data Science?

Ofcourse Yes. Frankly speaking, half of the Data Science is Statistics. Without Stats, there is no DS at all. The word Data Science is itself derived from 2-WORDS
1.) Data Mining: It is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis. Data mining techniques and tools enable enterprises to predict future trends and make more-informed business decisions.

2.) Computer Science: Computer Science is the study of computers and computational systems. Unlike electrical and computer engineers, computer scientists deal mostly with software and software systems; this includes their theory, design, development, and application.

The combination of these 2 subjects is called Data Science.

Ok now for a million dollar Question. Exactly how much statistics does a Data Scientist should know.
I can answer this question with an old joke
// A data scientist knows more statistics than a computer scientist and more computer science than a statistician//

If you didn’t find the above joke funny, I never promised that it is a good joke ;).

Jokes a part, here are what you need to learn about Statistics of Data Science .
Descriptive statistics

  1. Understanding distributions and plots

2. Univariate statistical plots and usage

3. Bivariate and multivariate statistics

4. Addition and multiplication rule

5. Spam not spam problem using bayes theorem

6. Binomial and normal distribution

7. Poisson probability function

8. Normal distribution function

9. Introduction to probability

10.Bayes theorem

That's not all. There are more. But lets stop here for now. Ill explain each and everything I mentioned above in a separate module.
