Air traffic network analysis

An analysis of air traffic network in Brazil with an Streamlit app

Marcos Vinícius
6 min readAug 8, 2021

This article has as purpose to demonstrate an analysis of the Brazilian air traffic network under different metrics and perspectives, considering parameters that are relevant to the understanding of the configuration of a network, either by identifying which are the most important nodes as well as the way that such network is organized and distributed.

Context

Determining which nodes are most important to a network is not anything trivial because almost never this information will be related only to the amount of connections that node has. For example, in a hypothetical case we have a network that is divided into two subnets and they are connected by only a single node, in addition, in each subnet there are multiple nodes with N connections. Thereby, doing a quick analysis we might misunderstand that the most significant nodes are those that have more connections, however this is wrong, since we do not take into account the node position and the geometric scheme of the network. In this case, the most important node is the one that links the two subnets. It has only two links, but without him the network breaks down.

In this study, we seek to understand the Brazilian air network under some metrics and concepts, such as network diameter, periphery, degree centrality and intermediation centrality. Also, understand the visualization using a core decomposition, by K-Core and K-Shell.

Metrics used
The metrics used to analyze this network were:

  • Diameter: it is the shortest distance between the two most distant nodes in the network.
  • Radius: it is the minimum eccentricity of a network.
  • Periphery: is a set of all nodes whose eccentriciy is equals the diameter.
  • Closeness Centrality: scores each node based on their “closeness” to all other nodes in the network.
  • Degree Centrality: assigns an importance score based simply on the number of links held by each node.
  • Betweenness Centrality: shows which nodes are “bridges” between nodes in a network.
  • Eigenvector Centrality: measures a node’s influence based on the number of links it has to other nodes in the network.
  • K-Core: is a subset of its nodes in which all nodes have at least k connections to each other.
  • K-Shell: is the set of all nodes belonging to the k–core of G but not to the (k+1)–core.

Technologies used
The technologies used for the development of this work were:

  • Python: programming language.
  • Folium: graph visualization.
  • Bokeh: graph visualization with more interactivity.
  • NetworkX: operations in networks.

At the end, we’ve created an simplest interactive application on StreamLit.

Datasets

The datasets used in this work can be found in the following repository, along with their documentation and additional information:

Data collection
Almost all data were made available by ANAC “National Civil Aviation Agency”, from the dataset containing all aerodromes until the dataset containing all flights in Brazil.

The greatest difficulty encountered in this data collection was due to the datasets offered by ANAC having airports with incomplete data, thus making it necessary to search for other data sources, as the second dataset that contained all airfields registered worldwide, but even so there were still Brazilian airports with pending information. After another search, we found a dataset offered by DataHub.io, where it was possible to complement the airports after performing an assembly in Python where we merge the datasets through a script.

Methodology

To get a vision of the dataset under the NetworkX tool and start the development of this study, it was necessary to convert the original datasets to the graphml format. For this, a data crossing was performed using the Python programming language, making the flights network data as follows:

Nodes attributes:
The id of each node is the ICAO airport code.

  • name: airport name;
  • country: country where the airport is located.
  • latitude: latitude of the airport reference point.
  • longitude: longitude of the airport reference point.

Edges attributes:

  • flight_count: number of flights carried out between these airports.

After this conversion, it was possible to implement different NetworkX functions, a library that focus on analyze networks, where it is possible to observe simpler metrics such as the diameter and perimeter of a network and have the possibility to build a logic that performs more sophisticated analyses. Adding this to Folium and Bokeh, other libraries in the Python ecosystem focused on graph and map visualization, we were able to create an interactive visualization where it is possible for the user to do their own analysis.

Results and conclusions

The study was a success and it was possible to analyze the Brazilian air traffic network under different metrics and interactively, as shown in the presentation below:

Table 1: Some of the many indicators available to make a simple analysis

In the App available on StreamLit, you can choose which year (even more than one) you want to view and track in real time the modifications in the tables, graphs and maps.

Right below it is possible to see how it was structured for the user to be able to interact: switching between the tabs (metrics) and being able to zoom in and inspect each node more precisely by Bokeh, like in this example where we inspect the Diamantina airport.

Figure 1: node ranking visualization by Bokeh.

In addition, it is still possible to see a interactive map showing all flights and having quantitative indicators associated with the thickness of the edges. By the way, many table are available to show some rankings associated to these maps, like the “Top 5 airports with more degree” or “Top 5 trips that happened the most”.

Figure 2: airports and flights visualization by Folium.

Finally, we can also compare how airports behaved before, during and after the COVID-19 pandemic. In other words, as the number of flights has decreased and this network’s metrics have changed over the years.

Additional analysis
An analysis was also done on core decomposition layers, called by K-Core and K-Shell. You can see in the picture below and in App.

Figure 3: visualization of k-score and k-shell of network.

This article was written for the network analysis course of the Bachelor’s degree in Information Technology (BTI) at the Federal University of Rio Grande do Norte (UFRN), with Ivanovitch Silva as a professor.

--

--