Analysis of Categorical Variables -Python DataScience

Muhammed Resit Cicekdag
3 min readNov 10, 2021
import seaborn as sns
import pandas as pd
import numpy as np
df=pd.read_csv('titanic.csv')

Categorical variable classes and frequencies

df["Sex"].value_counts()
#We will use value_counts , so often.

Unique classes of the categorical variable

df["Sex"].unique()

Unique number of classes

df["Sex"].nunique()

2

df.head(5)

How many categorical variables are there and what are their names?

cat_cols = [col for col in df.columns if df[col].dtypes == "O"]

[‘Name’, ‘Sex’, ‘Ticket’, ‘Cabin’, ‘Embarked’]

  • Do you think there are other categorical variables apart from these categorical variables we caught? Of course, we have. As I examined the dataset previously, and most of you can easily see from the head,
    [Survived , Pclass , Embarked ] These are Categorical Variables that look like Numeric Variables.
  • So how can we distinguish it from Numeric Variables? Is there any easy formula?

How do we catch variables that are numeric but have less than 10 classes?

num_but_cat = [col for col in…

--

--