Getting Started with Statistics

Descriptive vs Inferential Statistics

Fatih Şen, PhD
teaching statistics
5 min readFeb 7, 2023

--

In this article, we’ll provide an overview of statistics. More specifically, we’ll distinguish descriptive statistics from inferential statistics and get some hands-on experience using a Statistical Software Package, specifically MagicStat.

What is Statistics?

Statistics is the science of gathering, organizing, analyzing and drawing conclusions from data. This may sound like a very academic definition. When you hear the term statistics, most likely there was a study with the following components: First, there is a population of interest. Since we typically cannot assess the entire population, we take a sample of that population. Then we perform a statistical analysis on that sample, and use the results of that analysis to draw conclusions based on that sample.

Figure 1: Components of Performing Statistical Analysis

Descriptive Statistics vs Inferential Statistics

Statistics is typically divided into descriptive statistics and inferential statistics.

Descriptive statistics are numbers that are used to describe and summarize data. Descriptive statistics can be thought of as “reading the data as it is”. Let’s take a look at the following real-world data set in Table 1.

Table 1: Diet Dataset contributed by Ellen Marshall, University of Sheffield. Gender is either 0 (female) or 1 (male). Weights are measured in kg and heights in cm. There are three types of diets (1, 2, 3).

This dataset contains information on 78 people adhering to one of three diets. Participants self-reported their age and height. Participants’ weights were measured before the study and after being on the diet for six weeks.

Descriptive statistics is about overall description and summary of a dataset. For example, descriptive statistics can be used to answer the following questions: “Which people had the maximum or minimum weight loss?”, “Which diet was the most effective on average?”, “What is the age range among those participants as well as among males and females?”, “What is the median age of people in the study?”.

Inferential statistics deals more with inferences and generalizing beyond the current dataset. For example, to answer the question “Are there any significant differences in weight between before and after diet?”, we would need to run a paired samples t-test. Another research question such as “Which diet was best for losing weight?” would require us to perform the one-way between subjects ANOVA model, one of many tests in inferential statistics. We use inferential statistics to answer questions like these.

Hands-on Statistics Experience using MagicStat

You can use the MagicStat platform to get some quick, hands-on experience using statistics. First, open up any browser and type https://magicstat.co/app in the address bar.

Figure 2: You can either drag and drop a data file or choose a data file by clicking on the rectangular area.
Figure 3: Choosing a data file from a Mac operating system.
Figure 4: Choosing a datafile from a windows operating system.

Select the Diet.xlsx data file. The site uploads the data file and displays the following output:

Figure 5: The initial output of the dataset uploaded to MagicStat

The right side of MagicStat shows the descriptive statistics. Any and all inferential statistics are selected and displayed on the left side. When you import a data file, all the results that describe data are uploaded automatically on the right side of the page. The first section shows the total number of observations the data has. The next section shows the first five and last five observations of the dataset. This gives an overview of what the data looks like. Next, MagicStat displays the variables, automatically categorized into the categorical and numerical variables. After identifying the variables, MagicStat provides measures of central tendency (mean, median, mode) and variability (minimum, maximum, standard deviation).

Figure 6: Summary of numeric variables

On the bottom, MagicStat displays the bar charts for the categorical variables, and histograms and box plot charts for the numeric variables.

Figure 7: Bar Charts for categorical variables, histograms and box plots for numeric variables.

On the left side of the page, we see a variety of models that can be used for inferential statistics.

Figure 8: Parametric models options
Figure 9: Nonparametric models options

For example, to answer the research question “Are there any significant differences in weight between before and after diet?”, select the paired samples t-test anova model. Next, select the “pre.weight” and “weight6weeks” as the group 1 and group 2 variables. After clicking the “Analyze” button, the results are shown below. Since the p value is less than .05, we can conclude that there are significant differences between weights before and after the diet of participants. We will go into detail about how to conduct and interpret each statistical test in other blog posts.

Figure 10: Applying paired samples t-test to the diet dataset.

Take Away

Descriptive statistics is used to describe the data whereas inferential statistics is used for making inferences.

With descriptive statistics, we make mostly quantitative observations (e.g., ”the mean score was 75.6”). Although inferential statistics rely on numerical analyses, our observations are more qualitative, focusing on the characteristics and qualities of the variables rather than the numerical value (e.g., “Group 1 is different from Group 2”).

Descriptive statistics provide information on the typical values of a dataset as well as the variability of scores in the dataset. Central tendency metrics such as mean, median, and mode are used to get an idea of the typical score in the data set. Additionally, variability metrics such as standard deviation, min, max, range, kurtosis, and skewness help us understand the shape of the data (e.g., is it a normal distribution?) and identify any unusual patterns in the data (e.g., identify any outliers).

Inferential statistics, on the other hand, uses statistical models like correlation, anova, t-test, chi-square, etc. Note that while we distinguish descriptive statistics from inferential statistics, they both are used together while analyzing data. For example, the mean and standard deviation are often used in inferential parametric tests.

Typically descriptive statistics are reported using tables, charts, or graphs, in addition to raw numerical output. The results of inferential statistics include the output of the model (e.g., a t-score) and associated probability values (i.e., p values).

In descriptive statistics, our conclusions are limited to the data we have. With inferential statistics, however, our goal is to go beyond this data and generalize about a larger population.

The Table 2 below shows a summary about differences between Descriptive Statistics and Inferential Statistics.

Table 2: Summary of descriptive vs inferential statistics

You can find a video relevant to this blog post here.

Acknowledgement: This article was reviewed by Dr. Brent Morgan, a subject-matter expert in statistics.

--

--

Fatih Şen, PhD
teaching statistics

Fatih is a computer scientist, entrepreneur and writer. His current focus is artificial intelligence, data science, startup and personal life experiences.