Exploratory Analysis of Airplane and Flight Data In New York/New Jersey
In this post I will talk about analysis I did on Flight/Airport/Airplane data using a flight data api. An api is a set of functions within the code that allow you to create and access data from another system, database, or website. To begin, my goal when extracting the data form this api was see how often smaller city or smaller populated areas use their airports for flights as opposed to the bigger more commercial airports that are well known. The purpose of this is to see is these smaller airports can better maximize the location of these airports so that certain areas aren’t using an airport that barely has flights to other areas. This data lives in the flight api and specifically I will be looking at flight destinations, airports used (flights associated with an airport), types of aircraft, and the airports location/airport code. In order to get this data I had to use certain libraries within python such as requests. Requests in python allows you send HTTP requests. The request to that http website or database will return all the response data you asked about. Finally, lets take a look at the data.
This graph was made using data I extracted from the aviation api. Since this post is only concerned with data for airports in the New Jersey and New York area it only includes those airports which was parsed into the code shown at the end. The things that stand out here are the big values for airport codes EWR, JFK, and LGA. These airports are more formally known as Newark International Airport, John F. Kennedy International Airport, and LaGuardia Airport. Immediately we can tell why these airports filter in the most flights based off the data from the api. The cities where these airports are located in are Newark, Queens, and Queens. These places all have a very high population and are close to surrounding counties in that area. Furthermore, 2 of the 3 have stated that they fly international with their titles which also draws attention to them. These airports are considered the larger, more commercial ones mentioned above that attract a lot of people. They also filter in the bigger types of aircraft mentioned/shown. Let’s take a look at the other major insight on the data, in that many of the airports did not report any flights coming. For this exploratory analysis example, I am going to extract 2 specific airports from the data, BUF and IAG. These airports represent Buffalo Niagara International Airport and Niagara Falls International Airport. These airports are within 16 miles of each other and are also in a lesser populated area in comparison to queens. These airports could serve the same purpose to the same population yet they built 2of them which does not maximize any profit or land space for the city. Furthermore, if we take a look at the other data graphic displayed about aircraft types we can see more correlation between bigger airports and the type aircraft used for the flights (which ultimately leads to how popular an airport is). As clearly indicated in the graph, the B738/L type of aircraft is the most popular. These are more formally known as Boeing 738 meant for large amounts of passengers and for long distances such as international travel. It is no coincidence we see the highest type of plane used be associated with bigger airports and international travel. This also explains why the bigger airports are used more than the smaller ones because they provide more international/longer distance options. On the other we can look at a aircraft like the LJ35 which is known as a smaller jet type plane meant for small distances. These planes are associated with the smaller airports since those airports often do not have long distance flights. I would conclude that these smaller airports scrap the use of any large plane like the B738 and only market themselves for smaller flights to closer distances. This would make the smaller airports that don’t see a lot of foot-traffic more specialized than the bigger ones and therefore attract people in the area of the smaller airports to book flights through them.
Lastly, to conclude the article I will talk about some bugs and limitations I ran into using this data. As seen above this is some of the code I used to get the data (code taken from this github link https://github.com/cbuntain/umd.inst414/blob/main/Module01/AviationAPI.Example.ipynb). However, finding the data was a bit harder than just clicking a button. This data deals with only flights for airports in New York and New Jersey so we had to only parse the data that pertained to that. While I did not run into any bugs while parsing the data, I often noticed there were times the code would not print out what I would want. I had to change certain print statements and then see if that was the data I wanted. The data itself was fairly hard to use until I put it into excel which allowed me to create these graphs and analyze them. The data itself came from using the api, excel was merely a tool used to display it. Finally, in terms of limitations the biggest one I can think of is the scrapping approach I used. I wanted to include as many airports as possible but getting that data was harder than expected. As such, the results I have could be better produced if I had the data from every airport in those 2 states. I could do the same level of analysis and extract the data and then use programs to plot it but I was unable to find many points of value. However, just because I could not find them all does not mean the data is nonexistent and affects the findings I mentioned above. Overall, the analysis I did was based off the data I could obtain as well as the conclusions I made and it is important to understand those could change with more data being brought in.
Notable Citations