Travel Insurance Prediction

4 min readDec 3, 2023

In this article, I try to create a machine learning model to predict whether an individual will purchase insurance.

To predict Insurance with machine learning, I collected a dataset from Kaggle about previous customers of a travel insurance company. Our task here is to train a machine learning model to predict whether an individual will purchase the insurance policy from the company.

I will explain the code step by step:

# python libraries
import pandas as pd
import numpy as np
import plotly.express as px

# ml libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

# close warning library
import warnings
warnings.filterwarnings("ignore")

# read dataset
data = pd.read_csv("C:/Users/HAZAL/OneDrive/Masaüstü/Projeler/travel_insurance_prediction/TravelInsurancePrediction.csv")

data.head()

# We remove unnecessary columns
data.drop(columns=["Unnamed: 0"], inplace=True)

# We check if there are any missing values.
data.isnull().sum()

# If there was a missing value we would do this
data = data.dropna()

# check the data structure
data.info()

# Selects only the columns in data with string data type and prints the unique values of each column.

for col in data.select_dtypes(include='object').columns:
    print(col) # prints the column name that has the object type
    print(data[col].unique()) # prints the unique values of the column that has the object type

# We plot a histogram to see how age affects the purchase of insurance policies
figure = px.histogram(data,
                     x = "Age",
                     color="TravelInsurance",
                     title= "Factors Affecting Purchase of Travel Insurance: Age")
figure.show()

# We see how a person's type of employment affects the purchase of an insurance policy
figure = px.histogram(data,
                      x = "Employment Type",
                      color = "TravelInsurance",
                      title= "Factors Affecting Purchase of Travel Insurance: Employment Type")
figure.show()

# We see how a person's annual income affects the purchase of an insurance policy.
figure = px.histogram(data,
                      x = "AnnualIncome",
                      color = "TravelInsurance",
                      title= "Factors Affecting Purchase of Travel Insurance: Income")
figure.show()

# We write the values of Object type variables as binary, that is, we convert categorical data into nominal data. This is called One Hot Encoding.data["GraduateOrNot"] = data["GraduateOrNot"].map({"No": 0, "Yes": 1})
data["FrequentFlyer"] = data["FrequentFlyer"].map({"No": 0, "Yes": 1})
data["EverTravelledAbroad"] = data["EverTravelledAbroad"].map({"No": 0, "Yes": 1})
data["Employment Type"] = data["Employment Type"].map({"Government Sector": 0, "Private Sector/Self Employed": 1})

# We separate dependent and independent variables.
y = np.array(data["TravelInsurance"]) # Dependent variable
x = np.array(data.drop(["TravelInsurance"], axis=1)) # Independent variables

# We separate dependent and independent variables as train and test.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.10, random_state=42)

# To make our model more successful, we put them all on the same scale, that is, we perform normalization.
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.fit_transform(x_test)

# Machine learning
model = DecisionTreeClassifier(max_depth=3) # The model is created.
model.fit(x_train, y_train) # We train the model. The machine will learn y_train by looking at the data in x_train.
predictions = model.predict(x_test) #  It is the stage of testing whether the machine has learned or not. Predicts y_test from x_test.

# Model performance
accuracy = accuracy_score(y_test, predictions) # Calculates the accuracy of the model.
# Generates a comprehensive report. 
# It is used to further evaluate the performance between actual labels and model predictions.
# The report includes metrics such as accuracy, precision, sensitivity, and F1 score for each class.
report = classification_report(y_test, predictions) 
print(f'Accuracy: {accuracy}')
print(f'Classification Report:\n{report}')

Travel Insurance Prediction

Written by Hazal Gültekin