Univariate Analysis

Rahul Sehrawat
4 min readAug 24, 2021

--

Understanding the data is the primary step for solving any business problem. The more you dive deep into the variables given in the dataset, the better you become in finding the hidden patterns present in the data. The objective is to discover relationships between measures in the data and gain insight into the trends, patterns, and relationships among various entities present in the data set.

Univariate analysis is the simplest form of analyzing data. It consists of a single variable. It explores each variable independently. This is done by looking at the mean, mode, median, standard deviation, dispersion, etc. The simplest example of univariate analysis would be to analyze the salaries of all the employees present in your company.

Types Of Univariate Analysis

Measures of Central Tendency

Consists of mean, median, and mode. A single value attempts to describe a set of data by identifying the central position within that set of data. Let’s look at what all these central tendencies mean.

Mean

It is one of the most well-known measures of central tendency. It can be used for both discrete and continuous data. It is the total sum of all the values divided by the total values in the data.

Median

It is the middle value of the data set. So if you arrange the values from the data in ascending order and then find out the middle value, it will be the median. The median is less affected by outliers and skewed data.

Mode

The mode is the most frequent score in our data set. It is generally used for categorical data when you have to find out the most prominent category in the data set.

Measures Of Dispersion

In statistics, the measures of dispersion help to interpret the variability of data i.e. to know how much homogenous or heterogeneous the data is. In simple terms, it shows how squeezed or scattered the variable is.

It consists of

  • Range
  • Variance
  • Standard Deviation

Range

It is the difference between the maximum and minimum values in your dataset. So suppose you have one employee with a salary of 100 and one with 1 then the range would be 99.

Variance

Variance tells you how far your data is from the mean. It is calculated by taking the differences between each number in the data set and the mean, then squaring the differences to make them positive, and finally dividing the sum of the squares by the number of values in the data set.

A large variance indicates that numbers in the set are far from the mean and far from each other. A small variance, on the other hand, indicates the opposite. A variance value of zero, though, indicates that all values within a set of numbers are identical. Every variance that isn’t zero is a positive number. A variance cannot be negative. That’s because it’s mathematically impossible since you can’t have a negative value resulting from a square.

Standard Deviation

It is just the square root of the variance. Variance and S.D are very similar concepts used to check the volatility of the market. If variance and sd are low that means that the investment is less risky and vice versa.

Ways to Visualize Univariate Analysis

Frequency distribution tables

Bar charts

Histograms

Pie charts

Conclusion

In this blog, I talked about the types of univariate analysis and ways to visualize them. You can use these techniques based on your business problem and your requirements. Univariate analysis is the first step while starting any analysis and it is one of the most important ones. So I will recommend you to spend some time doing it.

--

--