How to build a Recommender System for Airbnb in Python

Alexandra Gastone
9 min readApr 14, 2020

--

A simple guide to building a Collaborative Filtering system using Sentiment Intensity estimates for Airbnb data in Boston

Row of houses
Photo by Brodie Vissers on Burst

Why should you care about recommender systems?

Recommender systems are one of the most important and successful applications of data science and machine learning technologies to business. They’ve become vital tools for companies that want to provide personalised user experiences, largely due to their powerful and scalable approaches to personalising content based on a user’s predicted interests. Take Amazon or Netflix for instance: at its core, their success lies in their ability to drive engagement through valuable and personalised recommendations for each of their users.

While there are a lot of good resources for digging into the theory and novel algorithms, the field of recommender systems is extensive and continuously expanding, making it a bit of a daunting task to find practical, real use cases that put it all together. The aim of this post is therefore to make the scope and implementation of such systems more straightforward.

We’ll be using the publicly available Airbnb dataset in Boston to build a recommender system that will give Airbnb users personalised listing recommendations based on what users with similar tastes have liked in the past. For example, if Joe and Billy both really liked their stays at Houses A and B, and Billy also liked House C, then we can recommend House C to Joe since Joe and Billy seem to have similar preferences in houses. This type of recommendation system is called User-Based Collaborative Filtering.

Collaborative Filtering for personalised user recommendations

Problem

So let’s break the problem down: what are the main tasks we’ll need to tackle in order to build our system?

  1. Determine which users have similar preferences
  2. Make recommendations based on those preferences

Approach

How are we going to tackle these tasks?

There are a few different ways that we could go about determining the similarity between user preferences (i.e. Pearson correlation between rating scores, cosine similarity between ratings vectors, etc.). Since our dataset includes all the comments that each user wrote for their individual stays, we’re going to be extracting the sentiment of these comments to gauge how positively or negatively they rated them. Using a model-based approach that’ll estimate a comment’s latent factors (more on this when we get to the recommendation step), we’ll then predict and rank a user’s sentiment polarity score for listings they haven’t yet visited.

We’ll therefore split our approach into three parts:

  • Part 1: Estimate the sentiment of comments
  • Part 2: Build our recommendation engine to predict sentiment score for all reviewer-listing pairs
  • Park 3: Make personalised recommendations for each user based on their ranked preferences

We’ll build up our recommendation system by exploring and implementing each of these steps sequentially in the sections below, using readily available libraries in Python.

To illustrate what the output of each step should give us, we’ll use a random user with the following comment on a particular stay :

'Great location for both airport and city - great amenities in the house: Plus Islam was always very helpful even though he was away'

Part 1: Estimate the sentiment of comments

Context

Sentiment Analysis is a subfield of NLP (Natural Language Processing) that as of late has been gaining lots of traction at the industry level, with more and more companies looking to extract the opinions of their clients or users from textual data such as reviews, chats, or emails in order to gain actionable insights. Given that most of the data that’s readily available usually comes in the form of text-heavy unstructured datasets, finding ways of extracting valuable information from these sources has become a key target for data-driven company growth.

Preparing our data

The dataset itself that we’ll be working with includes all unique listing and user IDs (listing_id, reviewer_id), and each user’s reviews on the listings they visited (comments).

A first look at our comments feature shows a couple of things we might want to take care of before doing any computations:

  • some entries contain empty or null comments, remove those
  • some reviews will have the prototypical “Host canceled this booking x days before…” comment when a stay is canceled, so we can also drop those entries
  • not all reviews are in English, we have to detect the language of the comments and, depending on what percentage of them are non-English, either drop them or translate them (it turns out the non-English comments only constitute 0.06% of all comments at this point, so we can go ahead and drop them for now, but feel free to try your hand at translating — Google offers a free python library implementing Google Translate’s API called googletrans)
  • other text normalization steps (i.e. remove numerical characters)
Preprocessing the ‘comments’ column for Sentiment Analysis

Note that we’re purposefully not going to be removing punctuation or capitalization, which are typically part of the text normalization step of the NLP pipeline. The reason for this is that the Sentiment Analyzer we’ll be working with actually incorporates this kind of information when estimating sentiment! NLTK’s Vader (Valence Aware Dictionary and sEntiment Reasoner) lexicon and pre-trained sentiment analysis tool has had a lot of success in dealing with reviews and social sentiment. Part of the reason it works so well is that it adds valence to words that are capitalized, that have degree modifiers (eg. very), are accompanied by punctuation (eg. !!!) or emoticons (eg. :) ), and it can even handle shifts in sentiment in a single sentence, also known as conjunctions.

Implementation

Vader actually returns a score for positive, negative and neutral sentiment, but we’ll only be looking at the compound score of all three for our system.

Generating polarity scores
Example of sentiment polarity compound scores for comments in English

For our example comment (it’s the second comment in the image above):

polarity = 0.91

which seems to agree pretty well with the content of the message. Other negative and positive comments also look like they’re being picked up pretty well, so looks like our Sentiment Analyzer is doing a good job!

Data exploration and data-driven actionable insights

Since the goal of this project is not to analyze the polarity feature per se, we won’t be going into too much detail here, but there are a couple of preliminary observations that I’ll point out that might pique your interest. I also created this dashboard using Streamlit and deployed on Heroku to visualize some relevant data, so feel free to play around with it!

Screencast of polarity dashboard

Distribution

Looking at the distribution of polarity, we can see that the values are overwhelmingly positive. Some food for thought: do we tend to write comments more often when they are positive? Do we avoid writing negative comments? Do we have a high threshold of what constitutes a negative experience?

Relation to other features

Relation between polarity and other selected features

We can also look towards answering the question of what contributes to a positive comment by examining the correlation between the polarity scores and other features in our data such as rating scores for cleanliness, location, value, etc… Polarity seems to be most highly correlated with the global review_scores_rating feature. Interestingly, location seems to be the least important factor taken into account for comment sentiment: polarity is least correlated with review_scores_location. If we further look at the polarity scores grouped by neighbourhood, we can see that the neighbourhoods with highest polarity vs location scores are different. For instance, the top three neighbourhoods ranked by polarity are the Leather District, Roslindale, and Jamaica Plain, whereas the top three neighbourhoods ranked by review_scores_location are Longwood Medical Area, Back Bay, and the North End. If you were in charge of providing advice to hosts regarding ways to increase their rate of positive comments, you’re in luck! Arguably the hardest feature for a host to change is the location of their listing, but by working on improving their other scores (eg. cleanliness, ease of check in, etc…) the data would suggest that this will actually reflect more strongly on their comments!

Part 2: Build our recommendation engine

Context

There are many different models for Collaborative Filtering that we can use to predict our reviewer’s polarity scores on listings they haven’t visited. Matrix factorization tends to yield the best results, it gives us how much a user is aligned with a set of k latent features, or underlying tastes. Singular Value Decomposition (SVD), which is the analysis model we’ll be using, takes a similar approach.

You may be familiar with SVD as a dimensionality reduction technique because, amongst other operations, it gives us the eigenvectors of an input matrix. It has also, however, proven itself to be quite useful for building recommendation systems. Essentially, SVD states that any given matrix M can be factorized as a product of three matrices: U, , and V. U is the n x k user-latent features matrix, V is the m x k listing-latent features matrix, and ∑ is a k x k diagonal matrix containing the singular values of our original matrix M, which just means that each value will represent how important a latent feature is to to predict user preferences. Using these three matrices, we can therefore recover the full M matrix of polarity values for all reviewer-listing pairs.

Preparing our data

There’s a key processing step we need to take before we can use our dataset for SVD: transform it into a utility matrix! A utility matrix is essentially a pivot of our 3 column dataset (reviewer_id, listing_id, polarity) into a sparse n x m table, where the axes correspond to our reviewer_id and listing_id columns, and the table is populated with polarity values for known reviewer_id/listing_id pairs and NaN if unknown. We can generate this table super easily with panda’s pivot_table function.

Then we just remove the per item average of all entries, and voilà! We’re ready to run our SVD algorithm!

Implementation

We’ll be using scipy’s svds function to recover the U, , and V matrices, then convert the ∑ values to a diagonal matrix and use matrix multiplication of all three matrices to recover the full matrix and get the predicted polarity values.

Building our recommendation engine with SVD

With k=100 latent features, our model performs pretty well on training data with a RMSE of 0.17. However take this with a grain of salt, we didn’t split the dataset into train and test data because the Cold Start issue of Collaborative Filtering techniques (i.e. can’t introduce unseen listings or users) would require manually creating those datasets, and we might inadvertently start introducing selection bias into our model. The prediction score here was therefore solely based on training data.

Part 3: Make recommendations!

Now that we’ve taken care of our two main tasks, all that’s left is generating our personalised recommendations for a particular user.

Using our initial example, we simply identify the user row in our full all_predicted_polarity matrix and rank our predictions according to polarity.

Generating our top N recommendations for a particular user
Listing recommendations for our example user based on top 10 predicted polarities

Conclusions

Let’s recap: using Sentiment Analysis, we were able to determine sentiment polarity for reviewer comments, and using SVD for Collaborative Filtering we were able predict polarity values for all reviewer-listing pairs based on latent user preferences, finally generating recommendations as a ranking of predicted polarities for any given user.

Once you’ve built your base recommender system, you could take it anywhere your data or business goals lead you. I’ve briefly mentioned a few insights that I thought may lead to impactful recommendations, such as exploring what drives positive polarities or how to maximize them for hosts, but feel free to use the tools we’ve learnt and implemented to explore to your heart’s content.

Thanks for reading! To see the full code, check out my Github repository for this project.

If you want to further delve into the topic, I’d suggest these posts as a good starting point:

And if you want to learn more about me, check out my Linkedin or personal portfolio page!

--

--