Flight Price Prediction with deployment

Sameer Kumar
Analytics Vidhya
Published in
5 min readOct 20, 2021

Introduction

Machine Learning Hackathons are one of the best ways to enhance your Machine Learning knowledge as it helps us to understand the different approaches to attempt a problem statement.

Machine Hack recently hosted a hackathon on Flight Fare Prediction where we had to predict the fares of flights based on some independent features. This was a unique experience for me as it my first hackathon ever. There were over 3000 participants and I achieved a rank of 363 in the competition. I got to learn about various approaches and techniques to solve a problem statement and after submitting my result, I even deployed the model on Heroku.

In this article, I will explain this project step by step in detail. So, fasten your seat belts and let’s begin!!

Steps in the project

  1. Importing libraries and dataset
  2. Exploratory Data Analysis
  3. Feature Engineering
  4. Feature Selection
  5. Hyperparameter Tuning
  6. Model Building
  7. Deployment

A] Importing Libraries and dataset

The above image shows the libraries used and the dataset. Our target variable is the price/fare of the flights and rest of the features are independent variables.

B] Feature Engineering and EDA

Since we have the dataset with us, it is important to pre-process the independent variables into simpler variables. Let’s deal with all numerical and categorical variables one by one. First, lets handle numerical data.

B.1] Date_of_Journey

Our job is to extract the journey day and journey month from Date_of_journey feature/column.

Code to extract the day and month

Below is the image of the dataset after creating the two new columns i.e. Journey_Day and Journey_month.

Drop the Date_of_Journey column now.

B.2] Departure Time

Now let’s deal with Dep_Time feature . Extract hours and minutes from the feature Dep_Time.

Below is the image of modified dataset after adding two new columns.

B.3] Arrival_Time

Now let’s deal with Arrival_Time feature. Extract Arrival_Hour and Arrival_min from the feature Arrival_Time.

Below is the image after addition of two new columns.

B.4] Duration

Now let’s deal with Duration feature. Extract duration_hours and duration_mins from the feature.

Below is the image of the dataset after adding two new columns.

Let us now handle the categorical data.

Airline, Source, Destination and Total_Stops are the categorical variables. The ML models do not understand categorical data. So, we need to convert them into numeric data.

There are two types of categorical variables:

  1. Nominal Variable: Variables which don’t follow any order or level. eg) Airline, Source and Destination.
  2. Ordinal Variable: Variables which have levels or follow order. eg) Stops.

We perform One Hot Encoding for nominal variables and Label Encoding for Ordinal Variables.

B.5] Airline

B.6] Source

B.7] Destination

Lets also drop route and additional info features from dataset as they do not contribute much.

B.8] Total_stops

Now concatenate all these features with original dataset.

Final image of dataset:

Now repeat the entire process for the test dataset and get the final data for test.

C] Feature Selection

Feature Selection is performed by 2 ways:-

  1. Correlation Heatmap
  2. Extra Trees Regressor

First let’s create the independent and dependent variable.

Below is the image of the correlation heatmap.

2) Using Extra Trees Regressor

E] Model Building and Hyperparameter Tuning

First let’s build the model using Random Forest.

Now let’s Use Randomized Search CV to enhance the accuracy.

Finally we got the testing accuracy of 81%.

It is not time to deploy our model on heroku cloud.

F] Steps for Deployment

  1. Create your model.py file(above one) and download the pkl file
  2. Create app.py using flask
  3. Create templates folder
  4. Create static folder
  5. Create requirements.txt file
  6. Create your Procfile
  7. Create account on Heroku and connect it with your Github.

PKL file

App screenshot

This was all about this project of Flight Price prediction. I will be attaching my LinkedIn and Github link below. You can connect with me to see more exciting projects. Happy Learning!!

https://www.linkedin.com/in/sameer-kumar-20988b1a6/

--

--

Sameer Kumar
Analytics Vidhya

AI Research intern at SCAAI || Kaggle 2x Expert || Machine Learning || Deep Learning || NLP || Python