Basic of Statistics — Part 1

Shivani Dashore
5 min readMay 24, 2023

--

Image Source — Internet

Greetings everyone! I hope you’re all doing well. In today’s article, I have covered the fundamentals of statistics, as it is a crucial skill for various data-driven professions such as Business Analysts, Data Analysts, and Data Scientists, among others. By reading this article, you will gain a comprehensive understanding of the basics of statistics.

Demystifying the Difference Between Data and Information

Data and information are often used interchangeably, causing confusion about their true meaning and significance. Understanding the distinction between these two terms is essential for anyone working with data. Let’s clarify the difference between data and information to shed light on their unique characteristics and roles in decision-making.

Data: The Raw Building Blocks

Data is the raw material that forms the foundation of information. It can take various forms, such as symbols, characters, numbers, images, or music. Data represent unprocessed facts and figures about objects, which can include humans, animals, or any inanimate things.

Crucially, data lacks inherent meaning. It exists in an unstructured, unorganized state and cannot directly inform decision-making.

For example -

  • Yes, Yes, No, No, Yes, Yes
  • 45,44,22,33,44
  • Red
  • 26012000

What does the number 26012000 mean?

Is It:

  • A birthday? (26 Jan 2000)
  • A Bank Account Number
  • A telephone number

Without processing or more information, this data is meaningless.

Information: The Processed and Meaningful Output

Information is the valuable output derived from data through processing, organization, analysis, and interpretation. It provides meaning, context, and value, allowing us to make informed decisions, gain knowledge, and understand a specific subject

Convert raw data into information, undergoes several stages:

  1. Organization: Data is structured and organized, often in tabular or visual formats.
  2. Presentation: Organised data is presented using charts, graphs, or other visualizations to enhance understanding.
  3. Analysis: Various analytical techniques and functions are applied to extract insights and patterns from the data.
  4. Interpretation: Analysts interpret the analyzed data, deriving meaning and significance from the patterns and insights discovered.

For example, consider a sales dataset containing customer names, purchase dates, product details, and transaction amounts. By aggregating the sales data, calculating total revenue, identifying top-selling products, and analyzing customer preferences, meaningful information can be derived. Total revenue for a specific period, the top-selling product, or the largest customer segment provides actionable insights for decision-making.

In essence, information is the processed and organized output that empowers decision-making, while data is the raw material that must undergo transformation to become meaningful information.

Understanding the difference between data and information is vital for practical data analysis, interpretation, and decision-making. By recognizing the value and potential within data, we can leverage knowledge to gain valuable insights and drive positive outcomes.

Statistics

Statistics is the branch of mathematics that involves collecting, organizing, analyzing, interpreting, and presenting data. It provides methods and techniques to make sense of data, identify patterns, and draw meaningful conclusions.

Why Statistics ????

-In today’s world, data is constantly being generated. However, if we cannot effectively utilize this data, it becomes meaningless and only occupies storage space. Statistics plays a crucial role in transforming raw data into meaningful information. This information enables us to analyze and understand what is happening in the world, ultimately aiding in decision-making.

In a data-driven era, statistics empowers us to extract valuable insights, predict outcomes, and drive evidence-based decision-making.

Types of Statistics

Descriptive Statistics

Inferential Statistics

Descriptive Statistics –

Descriptive statistics involves summarizing and describing the main characteristics of a dataset in a concise and meaningful way. Descriptive statistics do not involve making predictions or drawing inferences beyond the observed data.

Examples of descriptive statistics include:

  1. Measures of central tendency: Mean, median, and mode, which describe the central value or typical value of a dataset.

2. Measures of variability: Range, variance, and standard deviation, which describe how much the data values differ from each other.

3. Frequency distributions: Histograms, bar charts, and pie charts, show how the data values are distributed and how often each value occurs.

4. Box plots: A graph that summarizes the distribution of a dataset by displaying the median, quartiles, and outliers.

5. Summary statistics: A table or a set of numbers that summarizes the main features of datasets, such as the number of observations, the mean, and the standard deviation

Let’s say you have collected data on the daily sales of various products over a month. By using descriptive statistics, you can summarize and understand the main characteristics of the sales data.

The goal of descriptive statistics is to provide an overview and concise summary of the data, which can help in decision-making and understanding patterns or trends within the dataset. However, descriptive statistics alone cannot make predictions or draw conclusions about a larger population beyond the observed data. For that, inferential statistics techniques are used.

Inferential Statistics

Inferential statistics is a branch of statistics that helps us make predictions or draw conclusions about a larger group based on a smaller sample. It allows us to take what we learn from the sample and apply it to the entire population.

Here’s a simple real-world example:

Let’s say you want to know the average height of all students in your school, but it’s not practical to measure every student. Instead, you randomly select a sample of 100 students and measure their heights. Using inferential statistics, you can estimate the average height of all students in the school based on the sample.

It’s important to note that inferential statistics do not guarantee absolute certainty, but it provides a framework for drawing meaningful conclusions and making informed decisions based on sample data. We can make valid inferences and predictions about populations beyond the observed sample by employing appropriate sampling techniques, conducting hypothesis tests, and using statistical models.

So now the question arises what are population and sample

Population —

The population is the whole group of people, things, or events that we want to study or learn about. It’s like the big picture of what we’re interested in. For example, the population could include all the students in a school, all the customers of a company, or all the registered voters in a country. It’s the complete set that we want to understand and gather information from.

Sample —

A sample is like a smaller group that we choose from the larger population to study. It’s a subset of individuals, things, or events that we select to learn about the whole population. The reason we use a sample is to make conclusions or predictions about the entire population without having to study or observe every single member. It’s important that the sample represents the population well so that our findings can be valid and reliable.

Image Source — Internet

In the above picture, we can easily see that the sample represents the whole population.

Connect me on Linkedin

Thank you for taking the time to read the article. I appreciate your attention and feedback.”

If you found this article helpful, please consider sharing it with others who might benefit from it. Your support in spreading the word is greatly appreciated.”

--

--