As a mom, when I book a hotel, I would like the hotel to be family friendly, closer to the sightseeing, and relatively quiet. But, my standards would be different booking a romantic weekend for me and my husband. We would like to pick the hotels with great food, closer to bars, and musical events are a plus. My goal for this project is to accurately match customers with hotel inventory in this highly competitive market.
There are two basic architectures for a recommendation system: Content-Based (CB) systems and Collaborative-Filtering (CF) systems. The CB system focus on the properties of an item. They recommend items based on how similar this item is to the other items. In contrast, CF focus on the relationship between users and items. This system recommends item by matching with other item rated by the users.
This hotel recommender engine is a model-based CB recommendation system. In other words, I extracted some information from the dataset, and use that as a “model” to make recommendations without having to use the complete dataset every time. This approach offers the benefits of both speed and scalability.
I use Seaborn for the internal data visualization. Several python classification packages were used to build the recommender engine, including XGBoost, Random Forest, Naive Bayes and others. I used the Surprise package for the memory CB system. To put it all together, I wrote a small web app using Flask to present my results.
The data was provided by Expedia, one of the largest online travel agencies. Three sets of information were given: users, hotels, and destinations. The detailed description of data can be found on the Kaggle website. Aside from the commonly known features, such as check-in/out date, hotel price, and hotel location, Expedia also provided a destination matrix, giving information in the search region. The information could be things such as discussing how convenient to find a parking space in this area, whether there are any good restaurants near this hotel, whether this hotel is close to public transportation, etc. This destination matrix provided the key information that made it possible to generate a personalized hotel engine.
Other than the given features, I also added couple time features focusing on the user experience before and during the stay. Those features included how many days ahead the user booked the hotel, how long does the user stayed, weekdays vs weekend, hot vs cold season, etc. The time features will help me accurately pick up customers’ demand.
I used the Mean Average Precision@5 (MAP@5) as the evaluation metric, where |U|is the number of user events, P(k) is the precision at cutoff k, n is the number of predicted hotel clusters.
Before implementing the model-based recommender engine, I tried two memory-based models as the base models for comparison. The first model is based on hotel popularity. The cons for this recommender is that it is not personalized, every user gets the same recommended hotel — the score is really low, 0.01. The second base model is a memory CB recommender. The rows of my utility matrix are user ID, and the columns are the hotel clusters. The score is given based on the users experience: clicking the hotel link is scored a 1, booking the hotel is scored a 5. The recommender engine will track the most similar user profiles and give recommendations based on other users’ hotel preferences. This base model is also not robust, as it didn’t take the hotel and destination properties into account. The score is a bit higher, however, at 0.06.
I ensembled four models, Random Forest, XGBoost, Stochastic Gradient Descent (SGD) classifier, and Bernoulli Naive Bayes (NB). I computed the weighted average of probability from four models and picked the top five as my final recommendation.
The Random Forest, and XGBoost gave me a better estimate compared to the other two approaches. The Bernoulli NB performed the worst. The code can be found in my GitHub repository. My final ensemble model gave me score of 0.48.
I built a small web app to demonstrate my hotel recommender system. I tried two different scenarios, with kids (the first video) vs. without kids (lower video). As you can see, the recommender managed to give me different sets of hotels appropriate to my needs.
There are some features I would like to add, such as crime rate and the water quality in the neighborhood area.