Statistics: As if I know how to read Data

Nawin Raj Kumar S
kgxperience
Published in
4 min readNov 4, 2022

Basically, data rules over the world, “Information is the new Oil” the words uttered by Mukesh Ambani became a worldwide sensation. “Data Science is the fanciest job in the 21st Century” was said by Harvard University. Wherever we go, we see Data’s influence without knowing it. We people even in an argument use data to blame each other, and I am not speaking politics here for your information. But how did it all start? Where did it begin? How did normal activities and occurrences of the world become the so-called “Information” and become a powerful factor in our lives? The simple answer is “Statistics”.

My mentor used to say “When Statistics was born, Artificial Intelligence was also born.” And learning deeply into it I can very well second his opinion.

Statistics is the birthplace of Artificial Intelligence and Data Science. As a naive and innocent kid who liked Artificial Intelligence and hated Mathematics this was the first hit in my head when I started exploring AI.

Okay, coming to Statistics theoretically Statistics is defined as “Statistics is the science concerned with developing and studying methods for collecting, analyzing, interpreting and presenting empirical data.”. Basically, it deals with data and makes us deal with it. Sounds confusing, Right? #metoo. Okay, let’s apply it to real life. Consider you are going to the onion shop, you are buying onions, you are going to the heap of onions. You need to pick onions of good quality, so what will you do? You’ll basically look into the heap of onions and check their quality. You will have an assumption of the quality of the onions and determine whether to buy the onions or not. That, if done mathematically, is called Statistics. Statistics is a highly interdisciplinary field. We are taking data and using the data to answer our questions or problems we have. Statistics can be applied to research, science, engineering and various fields. In developing methods and studying the theory that underlies the methods statisticians draw on a variety of mathematical and computational tools. There are two things which play a major role in Statistics, uncertainty and variance. Uncertainty can be said as the innocence of the observer for a specific question asked. For example, knowing whether it will rain today or not? Or find out whether you will pass the test? (Although, I know my result beforehand).

Okay, Now there are three types of central tendency, mean, median, and mode. In statistics, a central tendency is a central or typical value for a probability distribution. Colloquially, measures of central tendency are often called averages. Consider a gang of 5 people who are close friends. We can determine the characteristics of each person by observing them when they are together, right? Individually they may be different people, but together they possess some characteristics which are defined as the “Characteristics of the group”. This may be the aggregate of all people’s characteristics or one guy who’s a dominant or influential person in that group, This is what is called the Central Tendency. We are basically stereotyping the data to have some features for all observations in it and it is actually right, but it is wrong in real life don’t do it.

  • Mean: The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. It is basically the ratio between the sum of terms and to a number of terms. Mean is the predominantly used measure of central tendency used everywhere. Let’s consider the example of Cricket for it. Every batsman’s average is displayed in it. We can determine if a batsman’s average is 50, then the chances of him scoring 50 or more is high.

Mean has a slight disadvantage, which is highly susceptible to outliers. We’ll see to it later when we discuss Central Tendency.

  • Median: The median is the middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data. But it doesn’t even consider other observations to count and in my opinion not a very good measure for a central tendency.
  • Mode: Mode is the most repeated value in the data set. The most repeated values become the mode of a dataset.

--

--