Travel Insurance Prediction

Hazal Gültekin
4 min readDec 3, 2023

--

In this article, I try to create a machine learning model to predict whether an individual will purchase insurance.

To predict Insurance with machine learning, I collected a dataset from Kaggle about previous customers of a travel insurance company. Our task here is to train a machine learning model to predict whether an individual will purchase the insurance policy from the company.

I will explain the code step by step:

# python libraries
import pandas as pd
import numpy as np
import plotly.express as px

# ml libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

# close warning library
import warnings
warnings.filterwarnings("ignore")
# read dataset
data = pd.read_csv("C:/Users/HAZAL/OneDrive/Masaüstü/Projeler/travel_insurance_prediction/TravelInsurancePrediction.csv")
data.head()
# We remove unnecessary columns
data.drop(columns=["Unnamed: 0"], inplace=True)
# We check if there are any missing values.
data.isnull().sum()

# If there was a missing value we would do this
data = data.dropna()
# check the data structure
data.info()
# Selects only the columns in data with string data type and prints the unique values of each column.

for col in data.select_dtypes(include='object').columns:
print(col) # prints the column name that has the object type
print(data[col].unique()) # prints the unique values of the column that has the object type
# We plot a histogram to see how age affects the purchase of insurance policies
figure = px.histogram(data,
x = "Age",
color="TravelInsurance",
title= "Factors Affecting Purchase of Travel Insurance: Age")
figure.show()
# We see how a person's type of employment affects the purchase of an insurance policy
figure = px.histogram(data,
x = "Employment Type",
color = "TravelInsurance",
title= "Factors Affecting Purchase of Travel Insurance: Employment Type")
figure.show()
# We see how a person's annual income affects the purchase of an insurance policy.
figure = px.histogram(data,
x = "AnnualIncome",
color = "TravelInsurance",
title= "Factors Affecting Purchase of Travel Insurance: Income")
figure.show()
# We write the values of Object type variables as binary, that is, we convert categorical data into nominal data. This is called One Hot Encoding.data["GraduateOrNot"] = data["GraduateOrNot"].map({"No": 0, "Yes": 1})
data["FrequentFlyer"] = data["FrequentFlyer"].map({"No": 0, "Yes": 1})
data["EverTravelledAbroad"] = data["EverTravelledAbroad"].map({"No": 0, "Yes": 1})
data["Employment Type"] = data["Employment Type"].map({"Government Sector": 0, "Private Sector/Self Employed": 1})
# We separate dependent and independent variables.
y = np.array(data["TravelInsurance"]) # Dependent variable
x = np.array(data.drop(["TravelInsurance"], axis=1)) # Independent variables
# We separate dependent and independent variables as train and test.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.10, random_state=42)

# To make our model more successful, we put them all on the same scale, that is, we perform normalization.
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.fit_transform(x_test)
# Machine learning
model = DecisionTreeClassifier(max_depth=3) # The model is created.
model.fit(x_train, y_train) # We train the model. The machine will learn y_train by looking at the data in x_train.
predictions = model.predict(x_test) # It is the stage of testing whether the machine has learned or not. Predicts y_test from x_test.
# Model performance
accuracy = accuracy_score(y_test, predictions) # Calculates the accuracy of the model.
# Generates a comprehensive report.
# It is used to further evaluate the performance between actual labels and model predictions.
# The report includes metrics such as accuracy, precision, sensitivity, and F1 score for each class.
report = classification_report(y_test, predictions)
print(f'Accuracy: {accuracy}')
print(f'Classification Report:\n{report}')

--

--