Analysis of Categorical Variables -Python DataScience
3 min readNov 10, 2021
import seaborn as sns
import pandas as pd
import numpy as npdf=pd.read_csv('titanic.csv')
Categorical variable classes and frequencies
df["Sex"].value_counts()
#We will use value_counts , so often.
Unique classes of the categorical variable
df["Sex"].unique()
Unique number of classes
df["Sex"].nunique()
2
df.head(5)
How many categorical variables are there and what are their names?
cat_cols = [col for col in df.columns if df[col].dtypes == "O"]
[‘Name’, ‘Sex’, ‘Ticket’, ‘Cabin’, ‘Embarked’]
- Do you think there are other categorical variables apart from these categorical variables we caught? Of course, we have. As I examined the dataset previously, and most of you can easily see from the head,
[Survived , Pclass , Embarked ] These are Categorical Variables that look like Numeric Variables. - So how can we distinguish it from Numeric Variables? Is there any easy formula?
How do we catch variables that are numeric but have less than 10 classes?
num_but_cat = [col for col in…