Visualizing Air Routes for Indian Airlines

Analyzing the structure of a web based network.

Rohit Kabra
Web Mining [IS688, Spring 2021]
7 min readMay 8, 2021

--

According to a recent article in Forbes, over the past few years India’s aviation industry has grown fast: in 2018, the International Air Transport Association (IATA) predicted that India would be the world’s third largest aviation market by 2024, and grow from 158 million passengers in 2017 to 572 million by 2037. However, after four years of double-digit demand growth, in 2019, air traffic in India rose 5.1%, down from 18.9% growth in 2018, according to IATA. To understand this statistics, it is very important to analyze the routes to and from Indian Airports. This Analyses can help a businesses to determine which Air Route is the most busy and how can connectivity be improved among different routes to give customers more loyal and reliable way of air transport.

The idea of this analysis is to determine different routes and how well connected are Indian Airports within the country so that businesses and government bodies can focus on network planning thus improving the operation and management of the airline.

Tools and Technology:

Python was used to perform the analysis. Various python libraries were imported for this analysis such as Pandas, Requests, NetworkX, Matplotlib, Plotly. The Final map of Airlines Route was visualized using Mapbox.

Data Source and Data Pre-Processing:

The Data used for this model was obtained from Travelpayouts website. Travelpayout data API provides a complete travel insights about Countries, Airports, Airlines, Routes etc.

For this model we extracted data for Countries, Airports and Routes in Json format which was further converted into Pandas dataframe.

The above code returns a file with a list of airports from the database.

Airport Data

After converting our dataset to pandas dataframe, we obtain 9749 rows and we filter the data only with required columns needed for our analysis namely:

  • code — airport IATA code
  • name — airport name
  • country_code — country IATA code
  • city_code — city IATA code
  • coordinates — lattitude and longitude of the airport
  • time_zone — time zone relative to GMT
Airport Data in Pandas Dataframe

Similarly we obtain the data for Countries from Travelpayouts which will give us Countries name along with its IATA code.

http://api.travelpayouts.com/data/en/countries.json

After Obtaining the data for countries and we shall merge our airport data set obtained earlier with the countries data set based on country_code.

The resulted data frame will be as follows:

Final Airport dataframe with Country’s name

Now, lets obtain the data set for routes. This data set will also be obtained similarly using request library.

Obtained Data is further processed into pandas data frame with required columns. To ease our analysis and to avoid confusion we rename few columns as follows:

Route Data: 64964 rows and 6 columns.
  • air_iata = Airline
  • departure_airport_iata = Source
  • arrival_airport_iata = Destination

Since we are interested only Indian Airlines, we will filter the routes in and out of India only and print the IATA code for all the airport in India.

Output:

IATA code for airports in India.

Now I will extract routes where the departure or arrival airport is in India.

and the total number of unique flight routes to and from India. To get this number we will group each source and destination and sum up the number of airlines in these routes.

There are a total of 861 unique flights routes in and out of India.

We will further determine how busy an airport is by adding all Destination airports to a series of all Source airports and counting them.

We are now ready with our required data. We can go ahead with our analysis. To perform this analysis we used pythons library: NetworkX

Network Structure

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It is a Python library for studying graphs and networks. NetworkX takes advantage of Python dictionaries to store node and edge measures.

Node: A node can be any hashable object such as a string, a function, a file and more.

Edges: Edges can be used to hold any arbitrary data such as weights, time-series and more.

In our network, Airport determine the node and the Route between them determine the Edges. The size of node is defined by the number of flights in and out of the airport and the size of the edge is determined by airline count operating in a particular route.

There are a total of 146 airports of our interest. This number is derived from the unique length of the nodes i.e. unique number of airports.

Graph Analysis:

Now, lets begin plotting. We will first plot a blank Networkx graph and add airport nodes to it:

Graphical Representation of all the nodes

The above graph represents all the Airports(Nodes) connected to India. We now go ahead and plot edges i.e. the route between these nodes.

Graphical Representation after the nodes are connected.

Moving Ahead with our analysis, we position the nodes using the geographical coordinates of the Airport.

Let’s do the plotting:

Airports connected based on their coordinates.

The above plot doesn’t give us a clear image of our analysis. It would be difficult to understand such graphical representation. The size of the node (Dark Blue) determines how busy an airport is, but it would be great if we would know which airport is that and where it is located.

To obtain this result on a map we use Mapbox.

Mapbox is an American provider of custom online maps for websites and applications. We obtain a token from Mapbox API and call out a function to read this token. This token can be obtained from Mapbox’s official website by creating a an account. This map gives us an interactive map like model. We can hover our mouse on any of the node and all the details such as Airport name, Number of flights, Country, City pops up, similarly when we hover our mouse on any of the edges, route of flight pops up. It is interesting to pop our result in such structure for an ease of understanding.

Map Representing airport and the routes.

Color representation:

  • Red: International Flight routes
  • Green: Domestic Flight routes
  • Purple Dots: Airports with higher traffic. (The size of the node determines how busy an airport is)
  • Orange dots: Smaller airport with less routes.

Limitation:

The data obtained is incomplete and hence our analysis doesn’t provide an accurate result. When we examine our graph we notice that there are no flight connection between India and Canada. There are only three direct flights from India to United States which is highly impossible.

Conclusion:

From the above analysis, we can determine all the routes and their frequencies. This can help us to know which route is the most busiest and people from which part of India does people travel abroad the most. This analysis can help business owners to determine the fare based on frequency and it would even help them to start a new network where the frequency is quite low.

We can further expand this analysis by including Airlines data and providing a network structure of a particular airline.

References:

--

--