Data scientists at Airbnb collect and use data to optimize products, identify problem areas, and inform business decisions. For most guests, however, the defining moments of the “Airbnb experience” happen in the real world — when they are traveling to their listing, being greeted by their host, settling into the listing, and exploring the destination. These are the moments that make or break the Airbnb experience, no matter how great we make our website. The purpose of this post is to show how we can use data to understand the quality of the trip experience, and in particular how the ‘Net promoter score’ adds value.
Currently, the best information we can gather about the offline experience is from the review that guests complete on Airbnb.com after their trip ends. The review, which is optional, asks for textual feedback and rating scores from 1–5 for the overall experience as well as subcategories: Accuracy, Cleanliness, Checkin, Communication, Location, and Value. Starting at the end of 2013, we added one more question to our review form, the NPS question.
NPS, or the “Net Promoter Score”, is a widely used customer loyalty metric introduced by Fred Reicheld in 2003 . We ask guests “How likely are you to recommend Airbnb to a friend?” — a question called “likelihood to recommend” or LTR. Guests who respond with a 9 or 10 are labeled as “promoters”, or loyal enthusiasts, while guests who respond with a score of 0 to 6 are “detractors”, or unhappy customers. Those who leave a 7 or 8 are considered to be “passives”. Our company’s NPS (Net Promoter Score) is then calculated by subtracting the percent of “detractors” from the percent of “promoters”, and is a number that ranges from -100 (worst case scenario: all responses are detractors) to +100 (best case scenario: all responses are promoters).
By measuring customer loyalty as opposed to satisfaction with a single stay, NPS surveys aim to be a more effective methodology to determine the likelihood that the customer will return to book again, spread the word to their friends, and resist market pressure to defect to a competitor. In this blog post, we look to our data to find out if this is actually the case. We find that higher NPS does in general correspond to more referrals and rebookings. But we find that controlling for other factors, it does not significantly improve our ability to predict if a guest will book on Airbnb again in the next year. Therefore, the business impact of increasing NPS scores may be less than what we would estimate from a naive analysis.
We will refer to a single person’s response to the NPS question as their LTR (likelihood to recommend) score. While NPS ranges from -100 to +100, LTR is an integer that ranges from 0 to 10. In this study, we look at all guests with trips that ended between January 15, 2014 and April 1, 2014. If a guest took more than one trip within that time frame, only the first trip is considered. We then try to predict if the guest will make another booking with Airbnb, up to one year after the end of the first trip.
One thing to note is that leaving a review after a trip is optional, as are the various components of the review itself. A small fraction of guests do not leave a review or leave a review but choose not to respond to the NPS question. While NPS is typically calculated only from responders, in this analysis we include non-responders by factoring in both guests who do not a leave a review as well as those who leave a review but choose not to answer the NPS question.
To assess the predictive power of LTR, we control for other parameters that are correlated with rebooking. These include:
- Overall review score and responses to review subcategories. All review categories are on a scale of 1–5.
- Guest acquisition channel (e.g. organic or through marketing campaigns)
- Trip destination (e.g. America, Europe, Asia, etc)
- Origin of guest
- Previous bookings by the guest on Airbnb
- Trip Length
- Number of guests
- Price per night
- Month of checkout (to account for seasonality)
- Room type (entire home, private room, shared room)
- Number of other listings the host owns
We acknowledge that our approach may have the following shortcomings:
- There may be other forms of loyalty not captured by rebooking. While we do look at referrals submitted through our company’s referral program, customer loyalty can also be manifested through word of mouth of referrals that are not captured in this study.
- There may be a longer time horizon for some guests to rebook. We look one year out, but some guests may travel less frequently and would rebook in two to three years.
- One guest’s LTR may not be a direct substitute for the aggregate NPS. It is possible that even if we cannot accurately predict one customer’s likelihood to rebook based on their LTR, we would fare better if we used NPS to predict an entire cohort’s likelihood to rebook.
Despite these shortcomings, we hope that this study will provide a data informed way to think about the value NPS brings to our understanding of the offline experience.
Descriptive Stats of the Data
Our data covers more than 600,000 guests. Our data shows that out of guests who submitted a review, two-thirds of guests were NPS promoters. More than half gave an LTR of 10. Of the 600,000 guests in our data set, only 2% were detractors.
While the overall review score for a trip is aimed at assessing the quality of the trip, the NPS question serves to gauge customer loyalty. We look at how correlated these two variables are by looking at the distributions of LTR scores broken down by overall review score. Although the LTR and overall review rating are correlated, they do provide some differences in information. For example, of the small number of guests who had a disappointing experience and left a 1-star review, 26% were actually promoters of Airbnb, indicating that they were still very positive about the company.
Keeping in mind that a very small fraction of our travelers are NPS detractors and that LTR is heavily correlated to the overall review score, we investigate how LTR correlates to rebooking rates and referral rates.
We count a guest as a referrer if they referred at least one friend via our referral system in the 12 months after trip end. We see that out of guests who responded to the NPS question, higher LTR corresponds to a higher rebook rate and a higher referral rate.
Without controlling for other variables, someone with a LTR of 10 is 13% more likely to rebook and 4% more likely to submit a referral in the next 12 months than someone who is a detractor (0–6). Interestingly, we note that the increase in rebooking rates for responders is nearly linear with LTR (we did not have enough data to differentiate between people who gave responses between 0–6). These results imply that for Airbnb, collapsing people who respond with a 9 versus a 10 into one “promoter” bucket results in loss of information. We also note that guests who did not leave a review behave the same as detractors. In fact, they are slightly less likely to rebook and submit a referral than guests with LTR of 0–6. However, guests who submitted a review but did not answer the NPS question (labeled as “no_nps”) behave similar to promoters. These results indicate that when measuring NPS, it is important to keep track of response rate as well.
Next, we look at how other factors might influence rebooking rates. For instance, we find just from our 10 weeks of data that rebooking rates are seasonal. This is likely because more off season travelers tend to be loyal customers and frequent travelers.
We see that guests who had shorter trips are more likely to rebook. This could be because some guests will use Airbnb mostly for longer stays and they just aren’t as likely to take another one of those in the next year.
We also see that the rebooking rate has kind of a parabolic relationship to the price per night of the listing. Guests who stayed in very expensive listings are less likely to rebook, but guests who stayed in very cheap listings are also unlikely to rebook.
Which review categories are most predictive of rebooking?
In addition to the Overall star rating and the LTR score, guests can choose to respond to the following subcategories in their review, all of which are on a 1–5 scale:
In this section we will investigate the power of review ratings to predict whether or not a guest will take another trip on Airbnb in the 12 months after trip end. We will also study which subcategories are most predictive of rebooking.
To do this, we compare a series of nested logistic regression models. We start off with a base model, whose dependent variables include only the non-review characteristics of the trip that we mentioned in the above section:
f0 = 'rebooked ~ dim_user_acq_channel + n_guests + nights + I_(price_per_night*10) + I((price_per_night*10)^2) + guest_region + host_region + room_type + n_host_listings + first_time_guest + checkout_month
Then, we build a series of models adding one of the review categories to this base model:
f1 = f0 + communication
f2 = f0 + cleanliness
f3 = f0 + checkin
f4 = f0 + accuracy
f5 = f0 + value
f6 = f0 + location
f7 = f0 + overall_score
f8 = f0 + ltr_score
We compare the quality of each of the models `f1` to `f8` against that of the nested model `f0` by comparing the Akaike information criterion (AIC) of the fits. AIC trades off between the goodness of the fit of the model and the number of parameters, thus discouraging overfitting.
If we were just to include one review category, LTR and overall score are pretty much tied for first place. Adding any one of the subcategories also improves the model, but not as much as we were to include overall score or LTR.
Next, we adjust our base model to include LTR and repeat the process to see what is the second review category we could add.
Given LTR, the next subcategory that will improve our model the most is the overall review score. Adding a second review category to the model only marginally improves the fit of the model (note the difference is scale of the two graphs).
We repeat this process, incrementally adding review categories to the model until the models are not statistically significant anymore. We are left with the following set of review categories:
- Overall score
- Any three of the six subcategories
These findings show that because the review categories are strongly correlated with one another, once we have the LTR and the overall score, we only need three of the six subcategories to optimize our model. Adding more subcategories will add more degrees of freedom without significantly improving the predictive accuracy of the model.
Finally we tested the predictive accuracies of our models:
Categories | Accuracy
LTR Only | 55.997%
Trip Info Only | 63.495%
Trip info + LTR | 63.58%
Trip info + Other review categories | 63.593%
Trip Info + LTR + Other review categories | 63.595%
Using only a guest’s LTR at the end of trip, we can accurately predict if they will rebook again in the next 12 months 56% of the time. Given just basic information we know about the guest, host and trip, we improve this predictive accuracy to 63.5%. Adding review categories (not including LTR), we add an additional 0.1% improvement. Given all this, adding LTR to the model only improves the predictive accuracy by another 0.002%.
Post trip reviews (including LTR) only marginally improves our ability to predict whether or not a guest rebooks 12 months after checkout. Controlling for trip and guest characteristics, review star ratings only improve our predictive accuracy by ~0.1%. Out of all the review subcategories, LTR is the most useful in predicting rebooking, but it only adds 0.002% increase in predictive accuracy if we control for other review categories. This is because LTR and review scores are highly correlated.
Reviews serve purposes other than to predict rebooking. They enable trust in the platform, help hosts build their reputation, and can also be used for host quality enforcement. We found that guests with higher LTR are more likely to refer someone through our referral program. They could also be more likely to refer through word of mouth. Detractors could actually detract potential people from joining the platform. These additional ways in which NPS could be connected to business performance are not explored here. But given the extremely low number of detractors and passives and the marginal power post trip LTR has in predicting rebooking, we should be cautious putting excessive weight on guest NPS.