Will your flight be late? — Analysis and Prediction.

Laurent Risser
Analytics Vidhya
Published in
6 min readMar 5, 2020

--

Exploratory analysis and regression model

Flight delay — Photo Credit: Pixabay

How many times has your flight been late? How many connections did you miss because of that?

In this project, I used the data from the U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics. I utilized different models to predict flight delays. The first part of this project was a classic exploratory analysis, and then I built a linear regression model to predict the flight delays.

An important aspect of the data scientist job consists of communicating its findings to people who do not necessarily know the technical aspects that data scientists control. Graphics or plots are surely the most powerful tool to achieve that goal, and mastering visualization techniques thus seems important.

Let’s get started!

Overview of the data set

First, I imported the different Python packages necessary to perform the analysis: Pandas | Numpy | Seaborn | Matplotlib | Sklearn

To have a global overview of the geographical area covered in this dataset, I plotted the airport’s location and indicated the number of flights recorded during the year 2015 on each of them. I had about 15 airports with more than 100,000 flights per year. There are more airports in the…

--

--