UA Recent Flights: Undirected Graph — NetworkX

Andrew Dziedzic
Web Mining [IS688, Spring 2022]
5 min readFeb 11, 2022

A key question I would like to answer with the United Airlines data and the undirected graph, is which airports within the United States have the most recent activity, and therefore which terminals United Airlines can make quick adjustments to if there is a significant increase in arrivals/departures in a short time frame.

The source of my network data is specifically United Airlines flights within the USA. I am using the website: Live United Flight Status — FlightAware -> (flightaware.com/live/fleet/UAL) to gather the latest 50 flights with data available on Flight Number, Airplane type, Origin, Destination, Departure Time, and Estimated Arrival Time. The nodes in my graph specifically represent the Airport Terminals, and the edges represent the flight path from one airport terminal to another airport terminal. I built the graph from my data by specifically using python programming and many libraries (pandas, numpy, networkx, and matplotlib.pyplot). I used an undirected graph utilizing NetworkX as an important library in Python.

The structure of the graph is specifically fixed at a certain node size and boldness indicator. Within the graph you can see important indicators such as the # of edges pertaining to certain airport terminals with high activity. The most important nodes in this graph are the ‘KEWR’ and ‘KORD’ nodes, which are the Newark Liberty International Airport in Newark, NJ and the Chicago O-hare International Airport. This is verified by using the degree of centrality measurement to understand which airports have the most connections.

A non-obvious insight I would like to extract from the latest flights from the United Airlines data is which airports specifically within the USA, have the most recent traffic arriving or departing into an airport. These insights can be used to quickly re-assign employees to various assignments, change cleaning and mechanical crews to various terminals, and/or quickly alter flight plans if a terroristic threat is triggered. Another non-obvious insight in this case could also be to modify construction timeframes and construction dates for when construction crews can work on renovating, modifying, and/or expanding current terminals. If there is a significant number of planes that have recently departed Newark (KEWR), and little to no planes arriving soon, the opportunity to have construction crews work overtime or to have slight modifications made on construction crew schedules can be seen now. Additionally, you can expand this to specifically home in on certain times within a day/week where little to no terminal activities appear within certain airports to perform corporate trainings for employees at times where the airports is extremely low in activity.

Software used here was primarily Python programming with the important addition of NetworkX as nx within the code.

Many bugs that were encountered was specifically due to the formatting of the data frame, the initial data frame is within an excel file that must be transformed into Python. Additionally, trying to have a very short efficient code structure was a challenge. Lastly, when you tinker and play with the node size and color sizes of the undirected graph, the graph changes significantly with just a minimal change in the node size and/or color size. This was a big surprise, and now I have a better understanding of the importance of node size when creating and visualizing undirected graphs. The possibility of using an API in the future could be useful in making the process more efficient.

Limitations of the data would be that I am capped at only 50 records for 50 flights. How much data I can gather within an hourly, or daily period will certainly influence insights for decision makers. I would also note if there was an option to change the departure time and arrival time to the same time zone throughout to get a clear comparison that would be fantastic. Currently, I am only fixed to the six variables already mentioned.

Main takeaways from this network analysis would be the clear indication of recent flights leaving ‘KEWR’ — Newark Liberty International Airport and there could be ad-hoc decisions made on additional maintenance, additional renovations, and/or additional cleaning crews to be dispatched to this terminal immediately. This would cause an increase in cleanliness, extra maintenance being performed, and/or construction crews decreasing project deadline dates by several days/weeks. Specifically, during this period of time, with the pandemic coming to an end/winding down, the opportunity to increase construction resources for the terminal delays that the pandemic has caused in the past two years, can be leveraged not only in Newark Liberty International Airport, but also at ‘KORD’ — Chicago O-hare International Airport. The delays in construction projects that the pandemic has caused can be immediately alleviated with additional workers in terminals. Additionally, as aircrafts have been hunkered down in aircraft hangers and require a significant amount of maintenance to reignite the aircraft for usage, the increase in mechanical/maintenance resources to work on the physical aircraft can be utilized during this time frame. The un-directed graph can be leveraged to increase/decrease employee resources at terminals. Lastly, if there was an emergency situation that takes place while the aircraft is flying, using the un-directed graph, the air traffic controller can re-direct the aircraft to the closest airport with the least amount of recent activity. In this case, if a plane is soon to be arriving in Newark Liberty International Airport, because of the high level of flights departing KEWR, the air traffic controller can make the recommendation to perform an emergency landing at Baltimore Washington International Airport (BWI) in Baltimore, Maryland

Screenshot of website used for recent flight data related to recent United Airlines flights:

Python Code:

# libraries
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
# ------- UNDIRECTED

# Build a dataframe with your connections within your data
# IMPORTANT NOTE: This time a pair can appear 2 times, in one side or in the other!
df = pd.DataFrame({ 'from':['KEWR', 'KIAH', 'KIAH', 'TNCA','KLAS', 'MMUN', 'KEWR', 'MMMX', 'KLAX', 'KIAD', 'KMCO', 'KGPI', 'MMMX',
'MROC', 'KSNA', 'MMUN', 'FACT', 'KORD', 'MMUN', 'MMUN',
'KEWR','KSNA','KEWR','MMUN','KIAD','KATL','PHNL','KIAD','MMUN','MBJ/MKJS','KIAH','KFLL','TJSJ',
'MMSD', 'MMUN','KIAD','KMIA','KEGE','KLAX','PHNL','KEWR','KLAX','KLAX','KEWR','KEWR','KORD','KORD','KORD','KORD','KORD'
],

'to':['MPTO', 'MMMX', 'MPTO', 'KEWR','KORD', 'KEWR', 'MMMX', 'KEWR', 'KORD', 'KLAX', 'KEWR',
'KDEN', 'KIAH', 'KIAH', 'KIAH', 'KSFO','KEWR','KAUS', 'KIAD', 'KDEN',
'KBOS','KIAH','KPBI','KIAD','KTPA','KORD','KSFO','KEWR','KSFO','KORD','KORD','KEWR','KEWR',
'KSFO','KORD','KSAT','KEWR','KEWR','KLAX','MROC','KORD','KDEN','KDEN','KMCO','KFLL','KDEN','KIAH','KEWR','KFLL','KSFO'
]})
# Building of the Graph..... please use the Graph function to create the graph
G=nx.from_pandas_edgelist(df, 'from', 'to', create_using=nx.Graph() )

# Creation of the Graph!
nx.draw(G, with_labels=True, node_size=300, alpha=0.8, arrows=True)
plt.title("Un-Directed")
plt.show()
#Degree of Centrality; based on the assumption that important nodes have many connectionsdeg_centrality = nx.degree_centrality (G)print (deg_centrality)

Output of Python Code:

Number of Flights = 50
{'KEWR': 0.48275862068965514, 'MPTO': 0.06896551724137931, 'KIAH': 0.1724137931034483, 'MMMX': 0.06896551724137931, 'TNCA': 0.034482758620689655, 'KLAS': 0.034482758620689655, 'KORD': 0.3793103448275862, 'MMUN': 0.1724137931034483, 'KLAX': 0.1724137931034483, 'KIAD': 0.1724137931034483, 'KMCO': 0.034482758620689655, 'KGPI': 0.034482758620689655, 'KDEN': 0.13793103448275862, 'MROC': 0.06896551724137931, 'KSNA': 0.034482758620689655, 'KSFO': 0.13793103448275862, 'FACT': 0.034482758620689655, 'KAUS': 0.034482758620689655, 'KBOS': 0.034482758620689655, 'KPBI': 0.034482758620689655, 'KTPA': 0.034482758620689655, 'KATL': 0.034482758620689655, 'PHNL': 0.06896551724137931, 'MBJ/MKJS': 0.034482758620689655, 'KFLL': 0.06896551724137931, 'TJSJ': 0.034482758620689655, 'MMSD': 0.034482758620689655, 'KSAT': 0.034482758620689655, 'KMIA': 0.034482758620689655, 'KEGE': 0.034482758620689655}

--

--