Airline Satisfaction Rates? What’s the reason?

Ethan Meyer
INST414: Data Science Techniques
2 min readDec 9, 2023
Three Different Methods For Clustering

For this module, I chose to analyze a dataset that I found on Kaggle that displayed the satisfaction of airline customers based on a variety of variables. These variables included a variety of demographic information like gender, age, and location but also gave us variables about the flight itself that would give influence the outcome of the satisfaction rates. I normalized and cleaned the dataset to only include the most relevant variables as well as changed the string and integer for some of the variables. In the end, the script analyzes the selected variables of ‘’Age’, ‘Flight Distance’, ‘Ease of Online booking’, ‘Departure/Arrival time convenient’, and ‘seat comfort’. These variables were clustered using two clustering techniques — K-Means and Agglomerative Clustering. For K-Means, we used the Euclidean distance metric. In Agglomerative Clustering, both Euclidean and Cosine distances were explored to capture different aspects of similarity. The implementation leveraged scikit-learn, a robust machine learning library in Python. More specifically and using the script as reference, I created three clusters that each represent a segment of customers who share similar characteristics in terms of age, flight distance, ease of online booking, departure/arrival time convenience, and seat comfort. These clusters were organized as Cluster 0: Characterized by high seat comfort, likely representing highly satisfied passengers. Cluster 1: Moderate seat comfort, suggesting a moderately satisfied group. Cluster 2: Cluster 2: Low seat comfort, indicative of less satisfied passengers. For the agglomerative clustering using Euclidean distance, clutter 0 was most similar to K-means Cluster 2, indicating dissatisfaction with seat comfort. Custer 1 aligned with K-means Cluster 0 with high seat comfort. Finally, Cluster 2 aligned with Cluster 1in moderate seat comfort. With Aggolerative Clustering using cosine, Cluster 0 was high seat comfort with high satisfaction, Cluster 1 was moderate in similarity to Cluster 1, and Cluster 2 was aligned with low seat comfort with was akin to K-Means Cluster 2

In conclusion based off our analyisis and visuals, we can see a strong correlation between seat comfort and satitsifcation and a correltion in enternaimnet and satisfation. But besides the obvious insights, this data would be useful to an airline as a basework for analyzng flight data in correaltion to customer satifscation. I am only limited by the variables presented in the dataset but these ideologies on a larger scale can be useful in improving the airline industry.

Link to Github
https://github.com/EthanMeyer41/414-Module-Assignments/blob/main/MOd%204.py

--

--