Statistics 101: Let’s not be ‘mean’ always!

Rohan Bali
Analytics Vidhya
Published in
4 min readNov 19, 2019

Why do we need to learn statistics?

We all hear today that “data is the new oil”, I believe analyzing the data is the new pipeline to enrich any business. Statistics provide us with those drilling and analyzing machines that are used to make the data available to us even more valuable.

So in this series of blogs, I” ll take-up some statistics concept(hopefully in the right order) and explain them w.r.t real-time applications.

Let’s dive into the world of statistics!

The following are some of the basic terminology used in statistics concepts:-

(i) POPULATION: The population is the collection of “items’ that are being studied.‘Items’ can be people, objects, etc. For example, if there is a study going on “Which cola drink is preferred by the people living in a city(say Delhi)”. All the residents of the city will be considered in the population set.

(ii) SAMPLE: A sample is a subset of the population. It is a true representation of the population. The process of extracting a sample from the population is known as sampling(a very important concept, to be discussed further). For example, while conducting the above study, it's an impossible task to include every person living in the city(Just a fact, the population of Delhi is 19million!!!). So, we need a sample from the above population to conduct our study.[Remember to take a true representation of the population!]

(iii) DESCRIPTIVE STATISTICS: When we gather particular insights about a group and reach some kind of conclusion about the same group then it is known as descriptive statistics. For example, if we select a group of people above the age of 75 and we gather insights that this particular group doesn't consume any soft-drink, then we can only reach a conclusion about this group and hence, we might able to discard them from our study.

(iv) INFERENTIAL STATISTICS: When we have gathered data from the sample, reached some conclusion about it and then apply the same conclusion to the whole population then, it is known as inferential statistics. For example, in the above sample that we have taken from the population, that sample shows that people prefer “COLA-A’ drink, then we can conclude the people in the city consume the same drink the most. Inferential statistics is also known as inductive statistics.

Now, the next question that comes into our mind is how do we measure stuff in a sample or in a population as a whole?

Every data is unique in its way. Thus, it's our job to identify what is the category of the data and which type of measurement do we need to use.

There are four categories of measurement:

(i) Nominal: It’s the most basic form of measurement. This type of measurement is used to create a distinction between the data. For example, we have an employee record data. The “Employee Id” is a nominal measurement, it just separates employees from one-another but, doesn't hold any statistical information.

(ii) Ordinal: Ordinal measurement provides us with some statistical insights about the data. It gives a distinct sense of ordering or some kind of ranking among all the data items. For example, the best performer among employees ranked (1,2,…..n) creates a ranking order from which we can draw some insights and take actions like giving incentives and helping the low performers.

(iii) Interval: This is very useful if we need to summarize the data. Intervals are the difference b/w numbers. The difference between numbers is always the same. The difference b/w they are meaningful in terms of statistics. For example, we can use a Likert scale for the employees to understand their satisfaction quotient at the given firm.

(iv) Ratio: It is considered to be the highest level of measurement among all the others explained above. It holds all the properties of ‘interval level’ expect one. Interval level doesn't have an absolute zero[Zero is an interval is like a sign of convention]. Whereas, an absolute zero in ratio means that there is an absence of a characteristic that is being studied. For example, the number of hours spent by each employee working in the office[Less than 4hours, between 5–6hours, greater than 6hours…..] is a ratio level of measurement.

This sum ups the basics terminology and some starter concepts of statistics.

Now, wasn't that easy!

Coming up :

Statistics 102: Basics Visualization- Its good to be ‘seen’!

--

--

Rohan Bali
Analytics Vidhya

Data Analytics professional with majors in Computer Science Engineering. Enjoys problem-solving and propelling data-driven decisions.