Visualizing Air Routes for Indian Airlines
Analyzing the structure of a web based network.
According to a recent article in Forbes, over the past few years India’s aviation industry has grown fast: in 2018, the International Air Transport Association (IATA) predicted that India would be the world’s third largest aviation market by 2024, and grow from 158 million passengers in 2017 to 572 million by 2037. However, after four years of double-digit demand growth, in 2019, air traffic in India rose 5.1%, down from 18.9% growth in 2018, according to IATA. To understand this statistics, it is very important to analyze the routes to and from Indian Airports. This Analyses can help a businesses to determine which Air Route is the most busy and how can connectivity be improved among different routes to give customers more loyal and reliable way of air transport.
The idea of this analysis is to determine different routes and how well connected are Indian Airports within the country so that businesses and government bodies can focus on network planning thus improving the operation and management of the airline.
Tools and Technology:
Python was used to perform the analysis. Various python libraries were imported for this analysis such as Pandas, Requests, NetworkX, Matplotlib, Plotly. The Final map of Airlines Route was visualized using Mapbox.
Data Source and Data Pre-Processing:
The Data used for this model was obtained from Travelpayouts website. Travelpayout data API provides a complete travel insights about Countries, Airports, Airlines, Routes etc.
For this model we extracted data for Countries, Airports and Routes in Json format which was further converted into Pandas dataframe.
#Get data for the airports
url = "http://api.travelpayouts.com/data/en/airports.json"
airport = http.request('GET', url)
airport.status# decode json data into a dict object
airport_data = json.loads(airport.data.decode('utf-8'))
airport_data
The above code returns a file with a list of airports from the database.
After converting our dataset to pandas dataframe, we obtain 9749 rows and we filter the data only with required columns needed for our analysis namely:
- code — airport IATA code
- name — airport name
- country_code — country IATA code
- city_code — city IATA code
- coordinates — lattitude and longitude of the airport
- time_zone — time zone relative to GMT
airport_df = pd.DataFrame.from_records(airport_data, columns=['name', 'code', 'city_code', 'country_code','coordinates', 'time_zone'])airport_df.head()
Similarly we obtain the data for Countries from Travelpayouts which will give us Countries name along with its IATA code.
http://api.travelpayouts.com/data/en/countries.json
After Obtaining the data for countries and we shall merge our airport data set obtained earlier with the countries data set based on country_code.
#Merge countries Data and airport data.
df = pd.merge(airport_df, countries)
df.head()
The resulted data frame will be as follows:
Now, lets obtain the data set for routes. This data set will also be obtained similarly using request library.
#Get data of all the routes
url = "http://api.travelpayouts.com/data/routes.json"
route = http.request('GET', url)
route.status# decode json data into a dict object
route_data = json.loads(route.data.decode('utf-8'))
route_data
Obtained Data is further processed into pandas data frame with required columns. To ease our analysis and to avoid confusion we rename few columns as follows:
- air_iata = Airline
- departure_airport_iata = Source
- arrival_airport_iata = Destination
Since we are interested only Indian Airlines, we will filter the routes in and out of India only and print the IATA code for all the airport in India.
airport_ind = airports[airports['Country'] == 'India']['IATA'].unique().tolist()print("Indian Airport IATAs: {0}".format(airport_ind))
Output:
Now I will extract routes where the departure or arrival airport is in India.
route_ind = routes[(routes['Source'].isin(airport_ind)) | (routes['Dest'].isin(airport_ind))]
and the total number of unique flight routes to and from India. To get this number we will group each source and destination and sum up the number of airlines in these routes.
route_ind = pd.DataFrame(route_ind.groupby(['Source', 'Dest']).size().reset_index(name='airline_count'))
There are a total of 861 unique flights routes in and out of India.
We will further determine how busy an airport is by adding all Destination airports to a series of all Source airports and counting them.
# Get flight count in and out of an airport, counting both wayscount = routes['Source'].append(routes['Dest']).value_counts()count = pd.DataFrame({'IATA':count.index,'flight_count':count.values})
We are now ready with our required data. We can go ahead with our analysis. To perform this analysis we used pythons library: NetworkX
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It is a Python library for studying graphs and networks. NetworkX takes advantage of Python dictionaries to store node and edge measures.
Node: A node can be any hashable object such as a string, a function, a file and more.
Edges: Edges can be used to hold any arbitrary data such as weights, time-series and more.
In our network, Airport determine the node and the Route between them determine the Edges. The size of node is defined by the number of flights in and out of the airport and the size of the edge is determined by airline count operating in a particular route.
There are a total of 146 airports of our interest. This number is derived from the unique length of the nodes i.e. unique number of airports.
Graph Analysis:
Now, lets begin plotting. We will first plot a blank Networkx graph and add airport nodes to it:
#Adding airport nodes
G = nx.Graph()
for node in nodes:
G.add_node(node)#Lets Draw the Graph
nx.draw_networkx(G)
plt.show()
The above graph represents all the Airports(Nodes) connected to India. We now go ahead and plot edges i.e. the route between these nodes.
for index, row in edge_df.iterrows():
G.add_edge(row['Source'], row['Dest'], weight=row['airline_count'])nx.draw_networkx(G)
plt.show()
Moving Ahead with our analysis, we position the nodes using the geographical coordinates of the Airport.
x, y = node_df['Lon'].values, node_df['Lat'].valuespos_dict = {}
for index, iata in enumerate(node_df['IATA']):
pos_dict[iata] = (x[index], y[index])for iata, coordinate in pos_dict.items():
G.nodes[iata]['pos'] = coordinate
Let’s do the plotting:
#Draw the nodes
nx.draw_networkx_nodes(G=G, pos=pos_dict, node_color='darkblue', node_size=(node_df['flight_count']/4).tolist())#Draw the edges
nx.draw_networkx_edges(G=G, pos=pos_dict, edge_color='turquoise', alpha=0.5,width=edge_df['airline_count'], arrows=False)
The above plot doesn’t give us a clear image of our analysis. It would be difficult to understand such graphical representation. The size of the node (Dark Blue) determines how busy an airport is, but it would be great if we would know which airport is that and where it is located.
To obtain this result on a map we use Mapbox.
Mapbox is an American provider of custom online maps for websites and applications. We obtain a token from Mapbox API and call out a function to read this token. This token can be obtained from Mapbox’s official website by creating a an account. This map gives us an interactive map like model. We can hover our mouse on any of the node and all the details such as Airport name, Number of flights, Country, City pops up, similarly when we hover our mouse on any of the edges, route of flight pops up. It is interesting to pop our result in such structure for an ease of understanding.
Color representation:
- Red: International Flight routes
- Green: Domestic Flight routes
- Purple Dots: Airports with higher traffic. (The size of the node determines how busy an airport is)
- Orange dots: Smaller airport with less routes.
Limitation:
The data obtained is incomplete and hence our analysis doesn’t provide an accurate result. When we examine our graph we notice that there are no flight connection between India and Canada. There are only three direct flights from India to United States which is highly impossible.
Conclusion:
From the above analysis, we can determine all the routes and their frequencies. This can help us to know which route is the most busiest and people from which part of India does people travel abroad the most. This analysis can help business owners to determine the fare based on frequency and it would even help them to start a new network where the frequency is quite low.
We can further expand this analysis by including Airlines data and providing a network structure of a particular airline.