Stories by Ayushman chaurasia on Medium

Transformation architecture

Ayushman chaurasia — Tue, 30 Dec 2025 18:07:41 GMT

Transformer Networks: A Simple Guide to How AI Understands Language

Transformers have completely changed how artificial intelligence understands and works with language. They are used in translation apps, chatbots, and smart tools like GPT. What makes transformers special is that they try to understand language the way humans do — by focusing on the most important words instead of treating every word the same.

1. What Are Transformer Networks?

Transformers are a type of neural network designed to work with sequences, such as sentences or paragraphs. Older models processed text word by word, but transformers look at the entire sentence at once.

They do this using something called self-attention, which allows the model to understand relationships between words, even if those words are far apart in the sentence.

Transformers were introduced in 2017 in a famous research paper called “Attention Is All You Need”. This paper changed how AI models are built for tasks like translation, summarizing text, and generating sentences.

🔎 Why Are Transformers So Important?

They process text faster by working in parallel
They understand long and complex sentences better
They are the base of modern AI models like GPT and BERT

2. The Main Idea: Attention

The key idea behind transformers is attention.

Attention means the model learns to decide which words matter the most in a sentence.

Example

When you read the sentence:
“I saw a huge dog at the park”,
you naturally focus more on “huge dog” because those words carry the main meaning.

Transformers try to do the same thing — they give more importance to meaningful words.

3. Transformer Architecture (Big Picture)

Transformers are built using two main parts:

Encoder — understands the input sentence
Decoder — creates the output sentence

How It Works:

The input sentence goes into the encoder
The encoder understands the meaning and context
The decoder uses that understanding to generate output step by step

The encoder and decoder communicate using attention layers.

4. Inside the Encoder

🧱 Encoder Structure

Each encoder block has three main parts:

Multi-Head Self-Attention — finds relationships between words
Feed-Forward Neural Network — processes the information
Add & Normalize Layers — keep the model stable and accurate

A standard transformer uses 6 encoder layers, stacked one after another. Each layer improves the understanding of the sentence.

5. Preparing Text for the Encoder

Before the encoder can work, raw text must be converted into numbers.

✉️ A. Tokenization

The sentence is broken into smaller pieces called tokens.

Example:
“I love reading books”
→ ["I", "love", "reading", "books"]

🔢 B. Word Embeddings

Computers don’t understand words — only numbers.
So, each word is turned into a numeric vector that represents its meaning.

Similar words have similar vectors. These vectors can be large, like 512 or 768 numbers long.

📍 C. Positional Encoding

Because transformers read all words at once, they don’t automatically know word order.

Positional encoding adds information about where each word appears in the sentence, so the model knows which word comes first, second, and so on.

6. Multi-Head Self-Attention (The Heart of Transformers)

This is the most important part of a transformer.

For every word, the model creates three vectors:

Query (Q) — what the word is looking for
Key (K) — what the word offers
Value (V) — the actual information

The model compares these vectors to decide how much attention one word should give to another.

Example

In “I love reading books”, the word “I” is more related to “love” than “books”, so it gives more attention to “love”.

7. Add & Normalization Layers

After attention and feed-forward steps:

Add keeps original information using shortcut connections
Normalize keeps numbers stable during training

This helps the model learn better and avoid mistakes.

8. Understanding the Decoder

The decoder is similar to the encoder but has one extra job — generating text word by word.

Decoder Layers Include:

Masked Self-Attention — stops the model from seeing future words
Cross-Attention — connects decoder with encoder output
Feed-Forward Network
Add & Normalize layers

Masked attention ensures the model predicts words one at a time, just like humans speak.

9. Final Output: Softmax Layer

At the end, the decoder suggests possible next words.

Softmax converts scores into probabilities
The word with the highest probability is chosen

Example:

“padhna” → 50% chance
“pasand” → 4% chance

The model selects “padhna”.

| Component            | What It Does                  |
| -------------------- | ----------------------------- |
| Attention            | Focuses on important words    |
| Positional Encoding  | Adds word order               |
| Encoder              | Understands input             |
| Decoder              | Generates output              |
| Multi-Head Attention | Finds different relationships |
| Softmax              | Chooses next word             |

Why Transformers Changed AI

Transformers removed the need for older models like RNNs and LSTMs. They made AI faster, smarter, and better at understanding long sentences. Today, almost all modern language AI systems are built using transformers

Hostel Occupancy Data Analysis Using Python (Beginner Project)

Ayushman chaurasia — Tue, 30 Dec 2025 17:12:12 GMT

Data analysis doesn’t always require huge datasets or complex models.
Sometimes, simple data + clear logic can already provide meaningful insights.

In this beginner-friendly project, I analyze hostel room occupancy data using Python.
The goal is to understand:

How many beds are occupied
How many are vacant
Which rooms are underutilized
Overall occupancy trends

This project is ideal for students who are starting with Python, NumPy, Pandas, and Matplotlib.

Tools and Libraries Used

We use three core Python libraries:

NumPy — for numerical and statistical calculations
Pandas — for handling and analyzing tabular data
Matplotlib — for data visualization

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Creating the Hostel Dataset

First, we create a sample hostel dataset.

Each room has:

A fixed capacity
A number of occupied beds

data = {
    "RoomID": ["R101","R102","R103","R104","R105","R106","R107","R108"],
    "Capacity": [4,4,3,2,4,3,2,4],
    "Occupied": [3,4,1,0,2,3,1,4]
}

df = pd.DataFrame(data)
df


RoomID Capacity Occupied
R101     4        3
R102     4        4
R103     3        1
R104     2        0
R105     4        2
R106     3        3
R107     2        1
R108     4        4

Dataset Explanation

RoomID → Unique room number
Capacity → Maximum beds available
Occupied → Students currently staying

The data is stored in a Pandas DataFrame, which makes analysis easier.

Calculating Vacant Beds

To find how many beds are empty in each room, we use a simple formula:

Vacant = Capacity − Occupied

df["Vacant"] = df["Capacity"] - df["Occupied"]
df

OUTPUT:

RoomID Capacity Occupied Vacant
R101     4         3      1
R102     4         4      0
R103     3         1      2
R104     2         0      2
R105     4         2      2
R106     3         3      0
R107     2         1      1
R108     4         4      0

This helps us immediately identify:

Fully occupied rooms
Partially filled rooms
Completely empty rooms

Mean Occupancy Statistics (Using NumPy)

Next, we calculate average values to understand overall trends.

mean_occupied = np.mean(df["Occupied"])
mean_vacant = np.mean(df["Vacant"])

print("Mean Occupancy:", mean_occupied)
print("Mean Vacancy:", mean_vacant)

OUTPUT :

Mean Occupancy: 2.25
Mean Vacancy: 1.0

What This Tells Us

Mean Occupancy → Average number of students per room
Mean Vacancy → Average number of empty beds per room

NumPy is used here because it is fast and optimized for numerical calculations.

Total Occupied vs Vacant Beds

To analyze hostel usage at a global level, we calculate totals.

total_occupied = df["Occupied"].sum()
total_vacant = df["Vacant"].sum()

print(total_occupied, total_vacant)

OUTPUT:

18        8

These values represent:

Total students staying in the hostel
Total beds currently unused

Visualizing Occupancy Using a Pie Chart

Numbers are good — visuals are better.

We use a pie chart to compare occupied and vacant beds.

labels = ["Occupied Beds", "Vacant Beds"]
sizes = [total_occupied, total_vacant]

plt.figure(figsize=(6,6))
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
plt.title("Hostel Occupancy Distribution")
plt.show()

OUTPUT:

Why a Pie Chart?

Clearly shows proportion
Easy to understand at a glance
Useful for non-technical audiences

Analyzing Vacancy Patterns

Now we identify rooms that are completely vacant.

vacant_rooms = df[df["Occupied"] == 0]
vacant_rooms

OUTPUT :

 RoomID Capacity Occupied Vacant
  R104     2       0        2

This is important because:

Empty rooms indicate poor space utilization
Management can redistribute students if needed

Calculating Occupancy Percentage

Finally, we calculate occupancy percentage per room.

df["Occupancy_Percentage"] = (df["Occupied"] / df["Capacity"]) * 100
df

OUTPUT :

RoomID Capacity Occupied Vacant Occupancy_Percentage
R101       4      3          1            75.000000
R102       4      4          0            100.000000
R103       3      1          2            33.333333
R104       2      0          2            0.000000
R105       4      2          2            50.000000
R106       3      3          0             100.000000
R107      2       1          1             50.000000
R108      4       4          0             100.000000

Why This Matters

Normalizes data across rooms of different sizes
Helps compare utilization fairly
Makes insights more meaningful

Key Insights from the Project

Some rooms are fully occupied, while others are underutilize
At least one room is completely vacant
Average occupancy is lower than total capacity
Visualization makes trends easy to interpret

Support Vector Machine (SVM) With Decision Boundary Visualization

Ayushman chaurasia — Fri, 26 Dec 2025 14:09:22 GMT

Definition : Support Vector Machine (SVM) is a supervised machine learning algorithm mainly used for classification.
It works by finding the best separating line (or boundary) between different classes.

CODE STARTED

we import the necessary libraries.

from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay

Load the Dataset.

The Breast Cancer dataset contains features related to tumor measurements and a target variable indicating whether the tumor is malignant or benign.

cancer = load_breast_cancer()

Select Features and Target.

Here:

Only the first two features are selected for visualization
y contains the class labels

x = cancer.data[:, :2]
y = cancer.target

Create and train the SVM Model .

kernel='linear' creates a straight decision boundary
C=1 controls the trade-off between margin size and misclassification

svm = SVC(kernel='linear', C=1)
svm.fit(x, y)

Plot Decision Boundary

This displays the decision boundary learned by the SVM model.

DecisionBoundaryDisplay.from_estimator(
    svm,
    x,
    response_method="predict",
    alpha=0.99,
    cmap="Pastel1",
    xlabel=cancer.feature_names[0],
    ylabel=cancer.feature_names[1],
)

Plot Data Points

This scatter plot shows :

Data points colored by class
Black edges for better visibility
The decision boundary separating the classes

plt.scatter(x[:, 0], x[:, 1], c=y, s=20, edgecolors="k")
plt.show()

Output:

K-Nearest Neighbors(KNN) Model

Ayushman chaurasia — Fri, 26 Dec 2025 12:00:47 GMT

K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification. It predicts the class of a new (test) data point using the following steps:

Calculate the distance between the test data point and all existing data points (usually using Euclidean distance).
Select the K nearest neighbors, where K is a number chosen by the user.
Count the classes of these K nearest neighbors.
Assign the majority class among them to the test data point.

We use the Euclidean Distance formula to measure how far the target point is from each existing data point.

Code Part

1st option : by creating own KNN Function ( model ).

First, we import the NumPy library, which is used to calculate the Euclidean distance.
Then, we import Counter to count how many times each class appears among the K nearest neighbors.

import numpy as np
from collections import Counter

then we define a fuction for euclidean distance formula

def euclidean_distance(point1,point2):
  return np.sqrt(np.sum((np.array(point1)-np.array(point2))**2))

KNN Function Explanation (Simple and Clear)

Define a KNN function where the user provides:

training data
training labels
target point
value of K

2. Create an empty list to store distances along with their corresponding labels.

3. Loop through the training data:

Calculate the Euclidean distance between each training point and the target point.
Store both the label and the calculated distance in the list using append().

4. Sort the distance list based on distance (from smallest to largest).

5. Select the labels of the K nearest data points.

6. Find the most common label among those K labels and return it as the predicted class.

def knn_predict(training_data,training_labels,target_point,k):
  distance = []
  for i in range(len(training_data)):
    dist = euclidean_distance(training_data[i],target_point)
    distance.append((training_labels[i],dist))
  distance.sort(key=lambda x:x[1])
  k_nearest_labels = [label for label,_ in distance[:k]]
  return Counter(k_nearest_labels).most_common(1)[0][0]

we have write our dataset with same variable name as we defined in out knn function above so ease of use.

training_data = [[1,2],[2,3],[3,4],[6,7],[7,8]]
training_labels = ['B','A','B','B','A']
target_point = [4,5]
k = 3

Our model is ready. Now we need to train this model by our dataset and then we get our output by our target point

predicted_label = knn_predict(training_data,training_labels,target_point,k)
print(predicted_label)

OUTPUT :- B

#— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — #

2nd option: by Directly using the existing KNN model from sklearn.

Same Dataset with training data & label, target point & value of k

training_data = [[1,2],[2,3],[3,4],[6,7],[7,8]]
training_labels = ['B','A','B','B','A']
target_point = [4,5]
k = 3

we just have to import it from neighbors from sklearn and giving value of n_neighbors(K) and assign it in a variable called knn.

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=k)

then we train the model by giving Labeled Data. & then it predict the class of Target Value.

knn.fit(training_data,training_labels)
predicted_label2 = knn.predict([target_point])
print(predicted_label2)

OUTPUT:- [‘B’]

Random Forest

Ayushman chaurasia — Fri, 19 Dec 2025 06:14:38 GMT

Random Forest is a supervised machine learning algorithm used for classification and regression problems.
It works by creating multiple decision trees and combining their results to make a final prediction.

In this post, we will use Random Forest Classifier to predict passenger survival using the Titanic dataset.

Import Required Libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import warnings
warnings.filterwarnings('ignore')

Load the Dataset

tt = pd.read_csv("titanic.csv")

Remove rows where the target column is missing:

tt = tt.dropna(subset=["Survived"])

Select Features and Target Variable

Here

X contains input features
y contains the survival status

x = tt[['Pclass','Sex','Age','SibSp','Parch','Fare']]
y = tt['Survived']

Data processing

Convert categorical values into numerical form and handle missing values.

x['Sex'] = x['Sex'].map({'female': 0, 'male': 1})
x['Age'] = x['Age'].fillna(x['Age'].median())

Split the Dataset

The data is split into training and testing sets to evaluate model performance.

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42
)

Train the Random Forest Model

rf_class = RandomForestClassifier(n_estimators=100, random_state=42)
rf_class.fit(x_train, y_train)

Make Predictions

y_pred = rf_class.predict(x_test)

Model Evaluation

Accuracy shows how many predictions were correct
Classification report provides precision, recall, and F1-score

acc = accuracy_score(y_test, y_pred)
class_rep = classification_report(y_test, y_pred)

print(f"Accuracy: {acc:.2f}")
print("Classification Report ", class_rep)

output

Accuracy: 0.80
Classification Report
                 precision   recall  f1-score   support

           0       0.82      0.85      0.83       105
           1       0.77      0.73      0.75        74

    accuracy                           0.80       179
   macro avg       0.79      0.79      0.79       179
weighted avg       0.80      0.80      0.80       179

Predict Survival for a Sample Passenger

This demonstrates how the trained model predicts survival for an individual passenger.

sample = x_test.iloc[0:1]
pred = rf_class.predict(sample)

sample_dict = sample.iloc[0].to_dict()
print("Sample Passenger:", sample_dict)
print(f"Predicted Survival: {'Survived' if pred[0] == 1 else 'Did Not Survive'}")

Sample Passender: {sample_dict}
Predicted Survival: Did Not Survive

Decision Tree

Ayushman chaurasia — Fri, 19 Dec 2025 05:56:30 GMT

Decision Tree is a supervised machine learning algorithm used for classification and regression.
It works by splitting data into different conditions and forming a tree-like structure to make decisions.

# In this post, we will build a Decision Tree Classifier using the Iris dataset with Python and scikit-learn.

import required libraries

from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score,f1_score,confusion_matrix,classification_report
from sklearn.model_selection import GridSearchCV

Load the iris dataset

Convert the dataset into a DataFrame for better understanding

iris = load_iris()

x = iris.data
y = iris.target
iris['feature_names']
data = pd.DataFrame(iris["data"],columns=iris["feature_names"])
data

output

We use MinMaxScaler to normalize the feature values between 0 and 1

scaler=MinMaxScaler()
x_normalized = scaler.fit_transform(x)

Split the dataset into training and testing sets.

x_train,x_test,y_train,y_test = train_test_split(x_normalized,y,test_size=0.2,random_state=42)

Train the decision Tree Model

clf=DecisionTreeClassifier()
clf.fit(x_train,y_train)

Make Predictions

y_pred = clf.predict(x_test)
y_pred

output

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0])

The confusion matrix shows how many predictions were correct and where the model made mistakes.

con = confusion_matrix(y_test,y_pred)
print(con)

output

[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

This report gives precision, recall, F1-score, and support for each class.

cl= classification_report(y_test,y_pred)
print(cl)

output

           precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Logistic Regression

Ayushman chaurasia — Sun, 14 Dec 2025 18:09:15 GMT

Definition : Logistic regression predicts a binary outcome, giving a yes or no answer, by classifying data into categories

Code Started

1st line import the ‘load_breast_cancer’ function, which gives you access to the breast cancer data set from scikit-learn.
2nd line import ‘LogisticRegression’ Model from scikit-learn’s Linear_model module.
‘train_test_split’ divides dataset into training and testing sets. This helps evaluate your model’s performance on unseen data.
4th line line imports the accuracy_score function, which measures how often your model correctly classifies data points

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

1st line loads the Breast Cancer dataset. The argument return_X_y=True ensures that the data is returned directly as two separate arrays: X holds the input features (data), and y holds the target labels (answers).
2nd line code splits the data. 80% is kept for learning (Train) and 20% is kept for the exam (Test).
3rd line creates the empty model. We name it clf. We give it extra steps (max_iter) to make sure it learns properly.

‘max_iter’ refers to the number of iterations the logistic regression model will use during training.

4. 4th line is the training step. The model studies the training data to find the pattern between X and y.

5. 5th line is the exam step. The model tries to predict the answers for the new test data (X_test)

6. 6th line, This calculates the score. It compares the actual answers (y_test) with the predicted answers (y_pred).

x(return_X_y=True),y = load_breast_cancer
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=23)
clf = LogisticRegression(max_iter=10000,random_state=0)
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)
acc = accuracy_score(y_test,clf.predict(y_test,clf.predic() ))

this line shows the output

print(y_pred)

Code Ended

output : [1 0 0 1 0 0 0 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 1 1 0 1 1 1 0]

Linear Regression

Ayushman chaurasia — Sun, 14 Dec 2025 18:02:17 GMT

Definition : Linear Regression is helps us find the average relationship between two factors in our data. In our code we consider X & Y variable for finding the average relations between them means finding Linear Regression between X & Y.

CODE STARTED

we import the necessary libraries.

we import numpy library which is essential for numerical operation and data manipulation.
next we import matplotlib.pyplot as plt for visualization.
finally we LinearRession Model from sklearn which is main algorithm for the Linear Regression

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

The ‘np.random.seed(42)’ line sets the seed for the random number generator. This is important for reproducibility, ensuring that the same sequence that the same sequence of random numbers is generated every time you run the code
This line ‘X = np.random.rand(50,1) * 100’ creates a NumPy array ‘X’ , containing 50 random value between 0 and 100.
The ‘Y’ variable is created by establishing a linear relationship with ‘X’ and adding noise.

np.random.seed(42)
X = np.random.rand(50,1) * 100
Y = 3.5 * X + np.random.rand(50,1) * 20

We create an instance of the “LinearRegression” Model, names “hello”.
Next line trains the model using X and Y data

hello = LinearRegression()
hello.fit(X,Y)

We then use the trained model to make predictions on the ‘x’ values, and store these predictions in ‘y_pred’.

Y_pred  = hello.predict(X)

We set the plot size, create a scatter plot for data points, and then plot the linear regression line with labels and grid.
then plt.show() confirms the plot is displayed

plt.figure(figsize=(10,6))
plt.scatter(X,Y,color='blue',label='Data points')
plt.plot(X,Y_pred,color='red',linewidth=2,label='Linear Regression')
plt.title("Regression Analysis")
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True)
plt.show()

Code Completed.

. . . The blue dots shows your actual data points
_ _ _ The Red line shows the prediction based on linear regression.

We can calculate Mean Squared Error ( MSE ) :

from sklearn.metrics import mean_squared_error
mse = mean_squared_error(Y,Y_pred)
print(mse)

Output is 36.764638874837246

Lab 1: Importing Data and Creating Visualizations Using Python

Ayushman chaurasia — Sun, 23 Nov 2025 18:35:53 GMT

random → it helps us to creates random data.
pandas (pd) →it helps us to store and work with data.
matplotlib (plt) → it is used to create different types of pictures , charts and graphs in Python.

import random
import pandas as pd
import matplotlib.pyplot as plt

This code creates a small sample dataset by generating random ages, genders, and incomes for 100 people. Then it turns that data into a table using pandas and saves it as a CSV file named data.csv. It’s a quick way to make your own dataset when you don’t have real data.

data = {

'age': [random.randint(20, 60) for _ in range(100)],
'gender': [random.choice (['Male', 'Female']) for _ in range(100)],
'income': [random.randint (20000, 100000) for _ in range(100)]
}

#Convert data to a pandas dataframe and save to CSV file

df = pd.DataFrame (data)
df.to_csv('data.csv', index=False)

df is the DataFrame that stores all the generated data in a table format. When you write df, it simply displays the full table so we can see the age, gender, and income values clearly.

df

this code draws a chart that shows how many people fall into each age group. It uses the age data, adds labels, gives the chart a title, and then shows the graph on the screen.

plt.hist(data['age'])
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show(

Output

This code creates a histogram of the age data using 10 bins and sets the bar color to black.

plt.hist(data['age'],color='black',bins = 10)
#plt.hist([mens_age, femail_age],  color=['Black,'Red'], label=['Male,'Femail'])
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show()

Output

The error usually happens because data is a dictionary, not a DataFrame, so data['gender'].value_counts() doesn’t work unless you use df['gender'] instead.

plt.bar(data['gender'].unique(), data['gender'].value_counts())
plt.xlabel('Gender')
plt.ylabel('Count')
plt.title('Gender Comparison')
plt.show()

output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipython-input-475948260.py in ()
----> 1 plt.bar(data['gender'].unique(), data['gender'].value_counts())
      2 plt.xlabel('Gender')
      3 plt.ylabel('Count')
      4 plt.title('Gender Comparison')
      5 plt.show()

AttributeError: 'list' object has no attribute 'unique'

by using df['gender'] instead of data['gender']

plt.bar(df['gender'].unique(), df['gender'].value_counts())
plt.xlabel('Gender')
plt.ylabel('Count')
plt.title('Gender Comparison')
plt.show()

Output