Natural Language Processing to gain company insights

Scraping and Analysing customer review text data to uncover findings for British Airways

Arthur Chong
Artificial Corner
8 min readAug 29, 2023

--

Photo by Isaac Struna on Unsplash

Hello! I am back with another Virtual Experience Programme! If you have yet to see my first article on it, check it out here!

Anyway, just a quick summary of what a Virtual Experience Programme is, they are programmes designed by companies. They are real-world challenges that businesses face and your job is to ‘work’ for them and help solve their problems! This is a perfect way to showcase your skills and build up your portfolio!

This time, I embarked on a Data Science journey with British Airways (BA). They are a flag carrier airline of the UK. As a data scientist at BA, our job will be to apply our data analysis and machine learning skills to derive insights that help BA drive revenue upwards. Customers who book a flight with BA will experience many interaction points with the brand. Hence, we have to understand their feelings, needs and reviews on the service that they have received.

There are 2 tasks for us in this programme and they are:

  1. Web scraping to gain company insights
  2. Predicting customer buying behaviour

In this article, I will be going through my approach to both of these tasks and hopefully help anyone out there who is facing difficulty with them!

For those interested in my code, you can check it out here!

Task 1: Web scraping to gain company insights

The first task that we are given is to scrape airline review data from the web. For this, we are provided with the website link to scrape from over here.

We are told to focus on the customer reviews on the airline itself. So the first thing we have to do is to scrape the reviews! I used Python’s Scrapy framework to accomplish this task and here is my code for it

import scrapy


class BaForageSpider(scrapy.Spider):
name = "ba_forage"
allowed_domains = ["www.airlinequality.com"]
start_urls = ["https://www.airlinequality.com/airline-reviews/british-airways/?sortby=post_date%3ADesc&pagesize=100"]

def parse(self, response):
review_container = response.xpath('//article[@itemprop="review"]')
for review in review_container:
rating = review.xpath('.//span[@itemprop = "ratingValue"]/text()').get()
text = review.xpath('.//div[@class = "text_content "]/text()').getall()
country = review.xpath('.//h3/span/following-sibling::node()[1]').get()
time = review.xpath('.//time/text()').get()
yield{
'rating' : rating,
'country' : country.strip(" ()"),
'text' : text,
'date' : time
}
pagination = response.xpath('//article[contains(@class, "pagination position")]')
next_page_url = pagination.xpath('.//ul/li[last()]/a/@href').get()
print(next_page_url)
if next_page_url:
yield response.follow(url=next_page_url, callback=self.parse)

As seen above, in addition to scraping the review, I tried to scrape for the numerical rating that the customer gave, the country the customer is from and also the date that the review was posted. I was hoping that they would give some additional insights into the reviews!

After collecting the data, it was time to preprocess the data to prepare it for analysis. We were told to look at topic modelling, sentiment analysis or wordclouds that can help provide some insight into the content of the reviews. I performed sentiment analysis on the reviews first using NLTK’s Vader tool. Vader is a tool that can predict whether a sentence is positive or negative without needing to be trained on any historical data. It also takes into account capitalisation of words and the context which the word is used in. For example, the sentence ‘I LOVE this’ would carry a more positive sentiment than ‘I love this’, and ‘I do not love this’ would be seen as a negative sentiment. Vader represents the overall sentiment in a compound score.

A compound score of above 0 represents a positive sentiment while a negative score represents a negative sentiment. To compare the accuracy of the model, I compared it to the numerical rating that the customer gave. I assumed that a rating of 6 and above would represent a positive review, a rating of 5 representing a neutral experience, and anything below 5 would represent a negative experience. The outcome can be seen in this picture!

Comparing it to the numerical score, the model achieved an accuracy of 0.72, which is not bad considering that we did not need to perform much work to achieve this outcome!

After reviewing the sentiments of customers, I proceeded to clean up the data (removing stopwords, punctuation, tokenizing the text and lemmatizing them). After which, I separated the dataset into positive and negative reviews according to the step above and created a wordcloud for each of them. Here are the results!

Negative Reviews

Positive Reviews

The first wordcloud is for negative reviews while the second wordcloud is for the positive reviews.

Some things become apparent from the 2 wordclouds:

  1. Most negative reviews were talking about flight delays and booking issues
  2. Most positive reviews were talking about the comfortable flight and good meal.

It seems that the airline is good at delivering quality in-flight experience but lacks in the pre-flight booking and checking-in experience. Through these wordclouds, we can see which areas the airline should look into and review their processes on.

Moving on to topic modelling. Topic modelling is a type of statistical modelling in Natural Language Processing to identify topics among a collection of documents. Using topic modelling in this case, we are able to see the most talked about aspects of the airline by customers!

To start off, I utilised TfidfVectorizer from Scikit-Learn. TfidfVectorizer stands for term frequency-inverse document frequency vectorizer. The main idea of this vectorizer is that it calculates the weight that a word has, meaning how meaningful the word is to our analysis. It takes into consideration 2 things, how many times the word appears in that specific document, and how many documents it appears in. The more documents it appears in, the less weight it will carry. So words like flight and crew, which are not stopwords, will not affect our analysis so much. After fitting this vectorizer and transforming it onto our customer reviews, we obtain a sparse matrix. I then imported the nonnegative matrix factorisation (NMF) from sklearn.decomposition. NMF helps to reduce the dimensionality of our data to extract meaningful features from our set of vectors. After this process, we can now view the top words that appear in each topic set and try to come up with the topics! After looking at the results, I came up with 4 topics that customers frequently talk about and they are:

  1. in-flight services
  2. flight booking
  3. flight seats
  4. check-in process

These are the distribution of the review topics

After using topic modelling and sentiment analysis, we can see that flight booking is very important to customers and is the most talked about and thus, BA should be focusing on improving their booking process!

Task 2: Predicting customer buying behaviour

In this task, we are given a dataset with various features that might affect whether a customer books a flight with BA or not. Some of these features include [number of passengers, flight duration, wants preferred seat, wants in-flight entertainment and the length of stay]. Here is a brief overview of the dataset

Our task is to train a machine learning model that can predict the target outcome, which is whether or not the customer makes a booking given the input features.

After reading in the dataset and looking at the distribution of booking status, I realised that the dataset is really imbalanced, with a majority of observations not making the booking.

For imbalanced datasets like this, it is important that we deal with it as this might cause overfitting of our model and the model will not be able to generalize well to unseen test points.

To work around this, I did undersampling, which is to sample observations from those booking not completed rows. The number of samples will be equal to the number of samples of booking completed. The new distribution will be equal as such.

After that, we can proceed with preprocessing the data such as encoding categorical variables and scaling our numerical variables. I proceeded to import the mutual information classification from sklearn.feature_selection. This allows us to see the mutual dependence of a feature variable and the target variable. This is the graph obtained from the code.

It is clear that the route of the flight is what affects the decision of whether or not the customer completes the booking with British Airways! After this, I proceeded to train the model in a few steps.

  1. Split the dataset into a training set and a test set.
  2. Fit the RandomForestClassifier onto our training set
  3. Evaluate the performance of the model

To evaluate the performance of the model, I used the accuracy score and the Area Under Curve (AUC) metric. The higher both of these scores are, the better the model performs. In the end, the model had both an accuracy score and an AUC score of 0.71. This is not bad, but it could be improved if we had more features to consider, such as customer demographic (Gender, Age, job type). With this, we can now tell BA what are the areas that it should look into to increase their revenue!

That is all for this Virtual Experience Programme and I would like to thank Forage and British Airways for this opportunity! Thank you for reading!

Connect with me!

LinkedIn
Email: arthurchong01@gmail.com

--

--

Arthur Chong
Artificial Corner

Undergraduate Data Science and Analytics student at The National University of Singapore interested in Machine Learning and AI