Statistic Tales 101 for Data Scientist
Statistics Tutorial Video Guide by Brandon Foltz…!!!
Hi Everybody, with my real world experience with these resources, I’m coming with awesome post again. One day, I thought about the statistics importance to gaining the better understanding from the data, and I was looking over internet , and fetching so many result. All is really good, But I missed something and that is structure way, which we guide me on perfect direction. I came across with so many you tube channel as well famous websites and courses. I found that some useful resources which I am going to share with you today.
#1) Data in terms of Statistics
Data is gold of 21st Century and getting information and convert that information into the knowledge. That is trendy, now a days, It’s called Machine learning. For getting better knowledge, Data need to be understood clearly , and here statistics will be help.
# 2) Descriptive Statistics I & II
Descriptive statistics are used to describe the basic features of the data in a study. Descriptive statistics can be useful for two purposes: 1) to provide basic information about variables in a data-set and 2) to highlight potential relationships between variables. The three most common descriptive statistics can be displayed graphically or pictorially and are measures of: Graphical/Pictorial Methods, Measures of Central Tendency, Measures of Dispersion, Measures of Association
Part 1 :
Part 2 :
Full Playlist : 1) Descriptive Statistics I 2) Descriptive Statistics II
Topics discussed: standard deviation, variance, normality, covariance, correlation
#3) Introduction to Probability
Probability is about forecasting the probability of future events, whereas Statistics are about evaluating the occurrence of past events.
Full Playlist : Introduction to Probability
Topics discussed : combinations, permutations, counting, sets, Venn diagrams, subsets, joint probabilities, marginal probabilities
# 4) Discrete Probability Distribution
A Distribution is a feature that shows a variable’s possible values and how often they occur, Or A Probability Distribution is a statistical equation that can be interpreted to provide the probability of different possible outcomes happening in an experiment.
Full Playlist : Discrete Probability Distribution
Topics covered: random variables, expected value, variance, binomial experiments, Poisson distribution
#5) Continues Probability Distribution
A continuous distribution describes the probabilities of the possible values of a continuous random variable
Full Playlist : Continues Probability Distribution
Topics discussed: curve area, normal curve, probability regions, variance influence on curve shape, z-distribution vs t-distribution
#6) Sampling and Sampling Distributions
A sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population.
Full Playlist : Sampling and Sampling Distributions
Topics discussed: point estimation, sampling, standard error, standard error of the mean and the sample size relationship to these topics.
#7) Confidence Interval Estimation
A confidence interval is a range of values based on a point estimate that contains the true population parameter at some confidence level. A confidence level does not represent a “probability of being correct”; instead, it represents the frequency that the obtained answer will be accurate.
Full Playlist : Confidence Interval Estimation
Topics discussed: interval estimation, confidence intervals, margin of error, and the effect of sample size on all of these topics.
#8) Hypothesis Testing
Hypothesis testing is predominant in Data Science. It is imperative to simplify and deconstruct it. Like a crime-fiction story, hypothesis testing, based on data, leads us from a novel suggestion to an effective proposition.
Full Playlist : Hypothesis Testing
Topics discussed: hypothesis formulation, null hypothesis, alternative hypothesis, Type I and II errors, two-tailed tests, one-tailed tests, z-tests, and t-tests.
#9) Z-test and T-test for Two Populations
Z-tests are statistical calculations that can be used to compare population means to a sample’s. T-tests are calculations used to test a hypothesis, but they are most useful when we need to determine if there is a statistically significant difference between two independent sample groups.
Full Playlist : Z-test and T-test for Two Populations
Topics discussed: mean difference, hypothesis formulation, null hypothesis, alternative hypothesis, Type I and II errors, two-tailed tests, one-tailed tests, z-tests, and t-tests.
#10) Inference about Population Variance
Inferences can be made about variance in the same manner as the mean; confidence intervals, hypothesis test, etc. Analyzing variance is very important for quality control and is a central tenet of Six Sigma. It allows us to make sure our processes are on target and within certain tolerances.
Full Playlist: Inference about Population Varianceml
Topics discussed: variance, chi-square, confidence interval, hypothesis test, standard deviation
11) Goodness of Fit and Independence Test
Goodness of fit is used when sample data fits distribution on certain population. It’s represent how set of observation fit to the population.
Full Playlist: Goodness of Fit and Independence Test
Topics covered: chi-square test for independence, goodness of fit for multinomial experiments
#13) ANOVA(Analysis of Variance)
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as “variation” among and between groups). The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of three or more independent (unrelated) groups.
Full Playlist: ANOVA(Analysis of Variance)
#14) Simple Linear Regression
Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. … Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x).
Full Playlist: Simple Linear Regression
#15) Multiple Regression
Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable).
Full Playlist: Multiple Regression
#16) Logistic Regression
Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
Full Playlist: Logistic Regression
#17) Analysis of Covariance
ANCOVA stands for analysis of covariance. ANCOVA is used when the researcher includes one or more co-variate variables in the independent variable.
Full Playlist: Analysis of Covariance(Ancova)
#18) Non-Linear Regression
Full Playlist : Non-Linear Regression
References: