Data science-Statistic Analysis(part 1)
In data science, statistical data analysis is one of the most important steps. Every data scientist must have a deep understanding of statistics.
The first step of every statistical analysis is to determine whether the data is dealing with population or sample data
population: collection of all item based on interest ,it is denoted by N and are called Parameter
Sample: A subset of the population denoted by n, is called statistics
point to be taken, population are hard to define and hard to observe in Real Life but in the sample, it is much easier to gather i.e Less time consuming and less costly. Statistics Test will almost always be working with Sample data. They are two defining characteristics one is Randomness and other is Representatives
Randomness: Randomness sample is collected when each member of the sample is chosen from the population by strictly chance
Representatives: Representatives sample is a subset of the population that accurately reflects the member of the entire population
If you want to learn the appropriate statistics to perform a different test, maybe this is the Stepping stone to a carrier in data science. So first we need to know the variables. For a different type of variables requires different types of statistics and visualization approaches
we can classify the data in two main ways based on its types and measurement levels.
Types of data
Categorical data: It consists of categorical Variables i.e Grouped data.
Numerical data: As its name suggest it represent Number and divided into 2 subsets one is discrete data and the other is continuous data.
Discrete Data: Discrete data typically involves counting rather than measuring. It can only take certain values.
Continuous Data: It does not assume a distinct point on a scale, it can take any value. It involves measuring
Examples: Apart from weight others are also continuous i.e height, Area, distance, and time.
Measurement levels
Qualitative data: Qualitative data is a type of data that describes the information. Qualitative data is often collected through direct or indirect observation or by asking questions.Qualitative data is further divided into two types
Nominal: It is simply named or labeled with no specific order. The nominal scale also called a Categorical variable scale.
Examples of nominal data include gender, marital status.
Ordinal: Group of category data but follow strict order i.e ordered categories, It is commonly used in various surveys and questionnaires.
Example of ordinal data, rating for services i.e rating between 0 to 5
Quantitative data: It can be measure or count and is expressed as a number e.g score of the exam, the weight of the person. Quantitative data is further divided in to
Both Interval and Ratio represent a number, but the ratio has True 0 value and interval don’t
example of ratio is Number of objects, Time, distance, and example for interval variable is Temperature (temperature is usually expressed in Celsius and Fahrenheit i.e they don’t have True 0 values. Finally, the number like 2,3,10,10.5 .etc are maybe interval or ratio but we have to be careful with the context you are operating in.
Overall Graphical representation
we have seen different types of data and measurement levels, we are ready to move on how to visualize data, which allow us to visually Represent the data we are working. It is much easier to visualize data if you know its types and measurement level.
To know the process, check out my next blog
Thank you for reading. I hope that this article could give you some brief Data science concepts about statistical analysis.