Will My Flight Arrive On-Time at Seatac? — Part 1 Data Visualization

Dana Lindquist
5 min readJan 24, 2019

--

This was the third project of my 12 week long Metis data science bootcamp. The prompt for this project was to complete a project using supervised learning. As with all Metis projects it was open ended and each student could take an idea and run with it.

I chose to examine domestic flight arrivals at Seatac Airport, my local airport, as on-time or delayed. The Bureau of Transportation web site (transtats.bts.gov) was the source of my data as they offer a wealth of information regarding on-time status of flights in the US.

Before getting into modeling, let’s do some visualization of the data!!

When are flights delayed?

My data consisted of all flights arriving at Seatac from July 2016 to July 2018, 2 years and 320,000 flights. I had the airline, flight number, departure city, flight distance, reason for a delay as well as planned and actual departure and arrival times.

I predominately used Tableau Public, a free tool, to examine my data. Tableau is easy to use and easy to modify queries of the data.

First look at the number of flights arriving at Seatac by week. The red bar is delayed flights stacked on top of the blue bar of on time flights. The number of flights is greater in the summer and shows an increasing trend over the two years. There is an increase in winter flights around Christmas. But what is most interesting is the percentage of delays is relatively even throughout the year. For the two years, flights were delayed on average 18% of the time

What do we know about these delays?

There are six airlines that make up 94% of the domestic flights into Seatac and most of the flights are for Alaska Airlines. I will concentrate on these airlines — Alaska, Delta, Sky West, Southwest, United and American. When we look at the percent of flights delayed, yes, Delta has less delayed flights than Southwest but the difference isn’t significant.

We can look at the average flight delay over the 2 year period for each of these airlines. There isn’t anything we don’t already know in this chart but it’s interesting to see. Delta is less likely to be delayed and the variation is about the same throughout the year.

Most delays for flights arriving at Seatac happen in the evening. It is interesting to note that this is also when there are fewer total flights. The reason for this deserves some more research but it is interesting to note.

Why are flights delayed?

Next we look at why flights are delayed. Each flight that is delayed is tagged with a reason code. These reasons are shown in this plot. My initial instinct was that weather would play a big role in delays, and it may still be a contributing factor in the Late Arrival category for weather in the departure city. But National Air Service (NAS) is the biggest contributing factor to delays and this is outside the control of the airlines. When you break out the delays by airline you can see that the NAS delays contribute evenly across all airlines. The peak on February 5, 2017 was the same day as the Seatac Diamond Robbery. I was not able to find a reason for the similar peak on October 29, 2017.

Does Weather play a role in delays?

Yes it does. If you look at the two big Seattle snow storms in this two year period, they do correspond to a peak in delays. But the effect is not as big as I originally expected.

Do airlines adjust flight time to adjust for delay possibility?

I believe they do. There seems to be an acceptable flight delay percentage of around 20% that all airlines want to meet. Below is an example of one flight from San Jose to Seattle. You can see that the airline is experimenting with adjusting the flight length and as expected the longer flight time produces less delays.

If you look at all flights in the dataset and group by airline and flight number then plot a histogram of each flight length’s standard deviation you can see that the airlines are indeed changing flight times. Most flights do not change in length but there are a significant number of flights that have a standard deviation of up to 30 minutes. Many of the larger variations are on the longer flights such as from Hawaii to Seatac.

Conclusions

This data exploration was a fascinating dive into airline and airport operations. In my next post I will take this data and attempt to predict if a flight arriving at Seatac will be on-time or delayed.

--

--