Statistics — Deep dive — Part1

Pavan Ebbadi
4 min readAug 30, 2020

--

Statistics is a form of mathematical analysis that uses quantified models, representations and synopses for a given set of experimental data or real-life studies.

Link to part 2

Types of Statistics

  1. Descriptive statistics -Descriptive statistics consists of the collection, organization, summarization, and presentation of data. Here the statistician tries to describe a situation.
  2. Inferential statistics — Inferential statistics consists of generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions. Here, the statistician tries to make inferences from samples to populations. Inferential statistics uses probability, i.e., the chance of an event occurring.

Sample vs Population

Understanding Sample vs population is the first step in learning Basic and Advanced statistics.

Population — consists of all subjects (human or otherwise) that are being studied.

Sample — group of subjects selected from a population.

Eg: If the statistician’s objective is to approximately calculate the average salary of all people in Massachusetts(MA) state, each and every resident of MA combined together forms the population.

In reality, he can’t collect data of entire population, he randomly surveys 100 people in Boston, this becomes sample. Whether this sample is right or wrong we will discuss later.

Types of variables

1. Qualitative variables: these variables can be placed into distinct categories, according to some characteristic or

attribute. Ex: gender, geographic locations, religious preference.

2.Quantitative variables: these are numerical and can be ordered or ranked. Ex: age, heights, weights, and body temperatures.These are sub-divided into : discrete and continuous.

Discrete variables assume values that can be counted.

Examples of discrete variables are the number of children in a family, the number of students in a classroom, and the number of calls received by a switchboard operator each day for a month.

Continuous variables can assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals.

Temperature, is a continuous variable, since the variable can assume an infinite number of values between any two given temperatures.

Variables can also classified by how they are categorized, counted, or measured. And this type of classification uses measurement scales.

1.The nominal level of measurement classifies data into mutually exclusive (non-overlapping),exhausting categories in which no order or ranking can be imposed on the data. Ex: gender, marital status, residence zip codes.

2.Ordinal level: Data measured at this level can be placed into categories, and these categories can be ordered, or ranked. Ex: grades.

3.Interval level: this level of measurement ranks data, and precise differences between units this of measure do exist; however, there is no meaningful zero. Ex: IQ of a person, height of a person.

4.Ratio level: possesses all the characteristics of interval measurement, and there exists a true zero. Ex: Bank balance, number of children, increase in temperature.

Types of Sampling techniques

1.Random Sampling: samples are selected by using chance methods or random numbers.

2.Systematic Sampling: samples are obtained by numbering each subject of the population and then selecting every kth subject.

For example, suppose there were 2000 subjects in the population and a sample of 50 subjects were needed. Since 2000/50 =40, then k =40,80,120,160 and so on.

3.Stratified Sampling: stratified samples are obtained by dividing the population into groups (called strata) according to some characteristic that is important to the study, then sampling from each group. Samples within the strata should be randomly selected.

4.Cluster Sampling: Here the population is divided into groups called clusters by some means such as geographic area or schools in a large school district, etc. Then the researcher randomly selects some of these clusters and uses all members of the selected clusters as the subjects of the samples.

Types of Studies:

1.Observational study — the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations.

2.Experimental study- researcher manipulates one of the variables and tries to determine how the manipulation influences other variables.

Categorization of Variables:

1.Independent variable :is the one that is being manipulated by the researcher. The independent variable is also called the explanatory variable. Ex: fertilizer used for growing plants.

2.Dependent variable: resultant variable is called the dependent variable or the outcome variable. Ex: Plant growth

3.Confounding variable: is one that influences the dependent or outcome variable but was not separated from the independent variable. Ex: Subjects who are put on an exercise program might also improve their diet unknown to the researcher and perhaps improve their health in other ways not due to exercise alone. Then diet becomes a confounding variable.

Reference: Elementary Statistics — Bluman

--

--

Pavan Ebbadi

Senior Advisor of Analytics at CVS Caremark. Leading a team to build Personalization engine for CVS customers with stats, machine learning and deep learning.