Inferring The General Sentiment of Flying in Various Areas Around the World

Published in

Social Media: Theories, Ethics, and Analytics

7 min readOct 27, 2020

This article is a continuation of Boarding Now: An Exploratory Look Into US Airline Sentiment Tweets.

For those who are well-traveled: what’s your experience with flying among different airports? Do you think your airport experience has varied vastly in one country versus another? Has whatever location you’re arriving or departing from matter that much in regards to travel? For financial or stress sake, are there some places you should potentially avoid traveling to or at least mentally prepare for?

Source: https://blog.aci.aero/mid-sized-airports-have-their-day-in-the-sun/

A dataset on Twitter US Airline Sentiment from 2015 contains a set of tweets from users who tweeted airlines with issues or feedback. With a sentiment analysis that was conducted on this set of tweets, additional categories such as sentiment (positive, neutral, and negative) and negative reason (the reason for negative sentiment tweet) were included. However, what I am interested in looking at is if there are any trends in how Twitter users feel about their flying experience in regards to location. How do people feel about flying in various areas around the world? Is flying to and from particular countries or cities more and less stressful than others? Let’s see if we can infer something about the actual areas and airports people have tweeted about

Steps to Creating The Network

To explore this question, I used Gephi (an open-source network analysis software) to attempt to answer my question.

Prior to creating the network (and even opening up Gephi), the first step was to organize the data by creating a file with just the sentiment and time zone columns to import into Gephi. This allowed me to look at just sentiment and time zone. It should be noted that in this dataset, there is actually a tweet location column. However, the time zone column was used instead because the data in that column was more consistent than in the location column. While the location column would seem more accurate to use in this case, there were many rows with irrelevant data not containing information on the location of the tweet as well as many more blanks than the timezone column. Even if several of the rows in the timezone column are not as specific (ie Central Time Zone (US & Canada)), we can still grasp a general idea of what trends may be present here. This is the primary limitation in this network analysis.

Once I imported the file with this information into Gephi, the result looks like it does in Figure 1. Before moving forward, I turned on the text feature and adjusted the size of the nodes. As you can see, however, it is rather messy and I needed to make some adjustments to make it easier to analyze. The next step was to run the network diameter undirected. Then, I used the Yifan Hu properties, Expansion, and Contraction layouts to shape the network better.

Layout Metrics
Nodes: 88 | Edges: 179 | Network Diameter: 4 | Avg. Path Length: 2.191
Contraction Scale Factor: 0.8 | Expansion Scale Factor: 1.2
Yifan Hu properties | Optimal Distance: 200 |Quadtree Max Level: 15

Applying colors (red → purple → blue) by weight ranking to edges, we can find which locations point to which sentiment the most. This means that the larger and more blue the arrow a time zone node is pointing to a sentiment node, the more tweets there are between that sentiment and time zone. If there’s a large blue edge between the negative node and Eastern Time Zone node, there were a large number of tweets from the Eastern Time Zone that had a negative sentiment. Thinner, more red edges represent the opposite. Medium, purple lines are somewhere in between.

Finally, using the Closeness Centrality Ranking, I adjusted the size of the nodes by how central they are in the network. More central nodes contain tweets with all types of sentiments whereas tweets further away from the center contain tweets with only one or two types of sentiments. The point of this technique is to be able to see clusters of tweets and sentiment types. Each of these sentiment clusters is categorized by color. The overall result of the network is depicted in Figure 2. Obviously, it is difficult to look at this from a distance, so I zoomed in on several parts of the network. Figures 3–9 depict various areas of the network.

Cluster colors:
Green: negative
Orange: positive
Blue: neutral
Teal: positive and neutral
Black: positive and negative
Pink: negative and neutral
Purple: negative, positive, neutral
Gray: sentiment type

Figure 3: Depiction of the purple cluster in the middle of the network | Figure 4: Depiction of the pink cluster on the left

Figure 5: Depiction of the black cluster in the top right | Figure 6: Depiction of the teal cluster on the bottom right

Figure 7: Depiction of the green cluster | Figure 8: Depiction of the blue cluster | Figure 9: Depiction of the orange cluster

Table 1: Timezones and Their Sentiment Type

Among all of these figures, we can see that there is a widespread of locations and sentiment types, but the majority of these locations are in the center, with all sentiment types (positive, negative, and neutral). Fewer nodes are further from the center of the network. Alternatively, Table 1 depicts the cities with one or two sentiment types.

Cities with all three sentiment types include the following: America/New York, Edinburgh, Brasilia, Central Time (US & Canada), Central America, Melbourne, America/Chicago, London, Buenos Aires, Helsinki, Amsterdam, Beijing, Brisbane, Tehran, Eastern Time (US & Canada), Athens, Pacific Time (US & Canada), Atlantic Time (Canada), Caracas, Mountain Time (US & Canada), Casablanca, America/Los Angeles, Brussels, Hawaii, Alaska, Adelaide, Greenland, Paris, Quito, Mid-Atlantic, Dublin, Arizona, Rome, Madrid, Sydney, Santiago, Indiana (East), Berlin, and New Caledonia. That’s a total of 39 timezones/locations.

Based on the edges of the network, we can point out that many negative tweets were mostly aimed at locations in North America, most notably Eastern Time (US & Canada), Central Time (US & Canada), and Pacific Time (US & Canada). More neutral and positive tweets were aimed at Eastern Time (US & Canada). Most of the other areas had fewer numbers of tweets of all sentiment types. Overall, this would imply that airports in the Eastern Time zone had the most criticism and feedback. Additionally, out of all locations in the network, only the Eastern Time zone has a blue edge towards the negative node, potentially meaning that flying to and from the eastern regions of the US and Canada can be the least ideal. It’s also important to note is that not all locations had negative feedback. Not all locations had positive feedback either.

In the dataset, there are a total of 14,640 tweets, of which 9,178 of them are negative. Among those 9,178 tweets, 4,793 of them have locations in the US and parts of Canada, meaning that approximately 52.22% of tweets contained negative criticism and feedback in airport locations across the US and Canada. This would mean most people had issues with airlines when flying in the North American continent, specifically in the United States.

Limitations, Implications, and Ethical Concerns
Does this really mean flying in the United States is the worst, and that flying elsewhere in the world is less stressful? Is it also correct to assume that based on this set of tweets that airports in Lima or Taipei are worse than airports in Prague or Istanbul? What does that mean for airports in various regions? Are negative flight experiences really an issue with airlines or are they an issue with the airport a user is flying to and from? Do airport operations in various countries across the globe differ, affecting a user’s experience? We may need to gather information from employees of airlines and airports on how each location operates and what the general customer satisfaction is. With only this network analysis, it would be too soon to assume that certain regions are better to fly to than others.

One limitation (as mentioned earlier) is that these tweets are not categorized by their corresponding airport and are only identified by the region the user tweeted in. There may be significant differences in flying from NYC than in Miami. This is where the location column in the dataset would come in handy if the locations contained in it were more accurate and properly labeled by users. We may not be able to narrow down to each airport in each time zone but it gives us an idea of which countries/areas in the world have had better user experiences.

Another important limitation of this analysis is that users can and likely are biased in their tweets. What would be more valuable to further answering this question is looking into the negative reason for these tweets in these locations. Do the issues stem from how airlines control their operations or the actual employees at certain airports?

Overall, it’s probably a bit extreme to say that you should avoid traveling to or within the US in your lifetime, but perhaps it is more stressful flying to and from here than other nations. The unfortunate thing is that flying likely will continue to be a stressful experience for some. If traveling is stressing you out, perhaps a staycation or virtual conference call may be in order.

Inferring The General Sentiment of Flying in Various Areas Around the World

Steps to Creating The Network

Written by Megan Resurreccion