Statistic Tales 101 for Data Scientist

Ashish Patel
ML Research Lab
Published in
5 min readOct 17, 2019

Statistics Tutorial Video Guide by Brandon Foltz…!!!

References : Linkedin.com

Hi Everybody, with my real world experience with these resources, I’m coming with awesome post again. One day, I thought about the statistics importance to gaining the better understanding from the data, and I was looking over internet , and fetching so many result. All is really good, But I missed something and that is structure way, which we guide me on perfect direction. I came across with so many you tube channel as well famous websites and courses. I found that some useful resources which I am going to share with you today.

#1) Data in terms of Statistics

Data is gold of 21st Century and getting information and convert that information into the knowledge. That is trendy, now a days, It’s called Machine learning. For getting better knowledge, Data need to be understood clearly , and here statistics will be help.

Source : https://www.youtube.com/channel/UCFrjdcImgcQVyFbK04MBEhA

# 2) Descriptive Statistics I & II

Descriptive statistics are used to describe the basic features of the data in a study. Descriptive statistics can be useful for two purposes: 1) to provide basic information about variables in a data-set and 2) to highlight potential relationships between variables. The three most common descriptive statistics can be displayed graphically or pictorially and are measures of: Graphical/Pictorial Methods, Measures of Central Tendency, Measures of Dispersion, Measures of Association

Part 1 :

Part 2 :

Full Playlist : 1) Descriptive Statistics I 2) Descriptive Statistics II

Topics discussed: standard deviation, variance, normality, covariance, correlation

#3) Introduction to Probability

Probability is about forecasting the probability of future events, whereas Statistics are about evaluating the occurrence of past events.

Full Playlist : Introduction to Probability

Topics discussed : combinations, permutations, counting, sets, Venn diagrams, subsets, joint probabilities, marginal probabilities

# 4) Discrete Probability Distribution

A Distribution is a feature that shows a variable’s possible values and how often they occur, Or A Probability Distribution is a statistical equation that can be interpreted to provide the probability of different possible outcomes happening in an experiment.

Full Playlist : Discrete Probability Distribution

Topics covered: random variables, expected value, variance, binomial experiments, Poisson distribution

#5) Continues Probability Distribution

A continuous distribution describes the probabilities of the possible values of a continuous random variable

Full Playlist : Continues Probability Distribution

Topics discussed: curve area, normal curve, probability regions, variance influence on curve shape, z-distribution vs t-distribution

#6) Sampling and Sampling Distributions

A sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population.

Full Playlist : Sampling and Sampling Distributions

Topics discussed: point estimation, sampling, standard error, standard error of the mean and the sample size relationship to these topics.

#7) Confidence Interval Estimation

A confidence interval is a range of values based on a point estimate that contains the true population parameter at some confidence level. A confidence level does not represent a “probability of being correct”; instead, it represents the frequency that the obtained answer will be accurate.

Full Playlist : Confidence Interval Estimation

Topics discussed: interval estimation, confidence intervals, margin of error, and the effect of sample size on all of these topics.

#8) Hypothesis Testing

Hypothesis testing is predominant in Data Science. It is imperative to simplify and deconstruct it. Like a crime-fiction story, hypothesis testing, based on data, leads us from a novel suggestion to an effective proposition.

Full Playlist : Hypothesis Testing

Topics discussed: hypothesis formulation, null hypothesis, alternative hypothesis, Type I and II errors, two-tailed tests, one-tailed tests, z-tests, and t-tests.

#9) Z-test and T-test for Two Populations

Z-tests are statistical calculations that can be used to compare population means to a sample’s. T-tests are calculations used to test a hypothesis, but they are most useful when we need to determine if there is a statistically significant difference between two independent sample groups.

Full Playlist : Z-test and T-test for Two Populations

Topics discussed: mean difference, hypothesis formulation, null hypothesis, alternative hypothesis, Type I and II errors, two-tailed tests, one-tailed tests, z-tests, and t-tests.

#10) Inference about Population Variance

Inferences can be made about variance in the same manner as the mean; confidence intervals, hypothesis test, etc. Analyzing variance is very important for quality control and is a central tenet of Six Sigma. It allows us to make sure our processes are on target and within certain tolerances.

Full Playlist: Inference about Population Varianceml

Topics discussed: variance, chi-square, confidence interval, hypothesis test, standard deviation

11) Goodness of Fit and Independence Test

Goodness of fit is used when sample data fits distribution on certain population. It’s represent how set of observation fit to the population.

Full Playlist: Goodness of Fit and Independence Test

Topics covered: chi-square test for independence, goodness of fit for multinomial experiments

#13) ANOVA(Analysis of Variance)

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as “variation” among and between groups). The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of three or more independent (unrelated) groups.

Full Playlist: ANOVA(Analysis of Variance)

#14) Simple Linear Regression

Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. … Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x).

Full Playlist: Simple Linear Regression

#15) Multiple Regression

Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable).

Full Playlist: Multiple Regression

#16) Logistic Regression

Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Full Playlist: Logistic Regression

#17) Analysis of Covariance

ANCOVA stands for analysis of covariance. ANCOVA is used when the researcher includes one or more co-variate variables in the independent variable.

.

Full Playlist: Analysis of Covariance(Ancova)

#18) Non-Linear Regression

Full Playlist : Non-Linear Regression

References:

  1. https://www.youtube.com/user/BCFoltz/
  2. https://en.wikipedia.org/wiki/Outline_of_statistics
  3. https://www.pitt.edu/~super1/ResearchMethods/StatisticsMatrix.htm

--

--

Ashish Patel
ML Research Lab

LLM Expert | Data Scientist | Kaggle Kernel Master | Deep learning Researcher