Travel Recommendations at Hopper
In this three part series, we’ll explore how to represent the travel preferences of Hopper users for use in machine learning (ML). We’ll first review how machine learning in travel differs from other large scale recommendation problems. In part two, we’ll review a nice approach for representing “cyclic” features like time of day by using two variables instead of one, which we can easily generalize to geographical features. Finally, in part three, we’ll show how this idea lets us represent “preference distributions,” rather than single feature values, which can even be useful for non-cyclic features.
Hopper was built on the premise that the combination of big data + machine learning could empower travelers in a completely unprecedented way. We collect huge volumes of airfare and hotel pricing data in real-time — we’re now adding about a trillion price points/month, with almost five years of history — which we use to help consumers make smarter travel purchase decisions.
We build a “trusted advisor” relationship with our users, offering advice about when and even where to fly, as well as the best time to buy in the face of rapidly changing future prices. This conversation is mediated through push notifications, along with in-app interaction and feedback. We’ve sent over two billion notifications to date, and today about 90% of our sales result directly from push notifications.
What’s different about travel recommendation?
The scale and structure of travel recommendations has some key differences compared to product recommendation in other industries (e.g. Netflix, Amazon, Spotify).
First is that the number of potential “products” is very large. There are approximately 100,000 scheduled flights every day, double if you count codeshares, each of which might be priced in tens or hundreds of variants (using different “fare basis codes”). A typical itinerary involves an average of about four flights (though not picked uniformly at random of course), as well as choosing departure and return dates within the upcoming year. If we counted every possible priced itinerary as a separate product, we would rapidly pass trillions. This makes it impossible to model each “product” independently, and also implies that almost all our price time series are sparse.
It also doesn’t make sense to treat itineraries independently from a user perspective. Indeed, Hopper’s raison d’être is to help users navigate the myriad potential tradeoffs between alternative flight and hotel options, based on each user’s unique constraints and perception of value vs price. This means that it’s important for us to model features that capture relationships between “products”: two flights to the same destination on the same date are much more likely to be interesting to a given user than a flight elsewhere on a different date.
Another challenge is that travel is a lower frequency business than many other categories. You might watch several TV shows a day, or buy from Amazon several times a month, but probably only travel two to three times per year. This has tended to make travel a highly transactional business, with sellers focussed on capturing the customer’s attention during those infrequent moments that they’re ready to buy. Hopper tackles this problem by encouraging users to watch trips much further in advance, so that we can offer advice and collect feedback on user intent throughout this longer consideration period, but the volume of user-product “rating” data is still relatively small on a per-user basis.
The user preference data we collect in travel is also a little different than other verticals. Obviously there is a strong geographic component which is central to understanding origin preferences (how far are you willing to drive to save money on airfare?), and is also a strong factor in destination selection (although thematic factors are also important there). Data about time is also critical, in multiple dimensions: when you prefer to interact with Hopper, when you should buy your ticket (buy now, or wait for the a lower price?), as well as when you should travel (is it worth shifting your trip a week later, or switching to an earlier flight on the same day?). Both geography and time can be challenging to represent in a way that’s useful for ML: do we care about absolute time or place, time or distance relative to some other attribute, and/or some cyclical component like time of day, day of week or annual seasonality? But handled carefully they can be powerful predictive elements.
Representing User and Product Features
The goal of any recommendation algorithm is to find the best ways to match objects from one set (typically users) to objects in another set (typically products). In most cases, this boils down to measuring similarities between objects, either within a set (users like you also liked…; users who bought X also bought Y), or between two sets via a shared set of attributes (users who like sci-fi probably like Star Wars). In almost all cases, this boils down to a dot-product calculation, where we generate some kind of similarity score, comparing (say) a vector of user preferences u with (say) a vector of product features v by calculating u · v. The score might be used directly to rank products for users, or might be transformed to calculate a probability such as the likelihood that a given user would buy a particular product. Depending on the ML technique, we might directly measure the attributes in each vector, fit them based on observed outcomes, or some hybrid of the two. In this discussion, we’re specifically interested in features where we’re selecting an explicit representation, as opposed to (say) latent features that emerge from optimization.
In most cases, the individual features in the vectors are either numeric variables (for example, a flight price, flight duration, or number of stops), or collections of binary features used to represent a categorical variable via “dummy coding” or “one-hot encoding”. For example in a movie recommendation problem, an individual feature might be “suspense preference.” A user who likes suspense will have a high value for that feature, and will thus get higher similarity scores for movies that are also rated as suspenseful, with some implied linear relationship between the user preference and the movie score. A categorical feature like primary dialogue language might use a dummy-coded feature, with a French movie represented as (French: 1, English: 0, German: 0, …). A user who prefers French but also watches in English might be represented as (French: 0.7, English: 0.3, German: 0).
In part II, we’ll look at another representation that’s particularly useful for cyclic and geographic variables that can “wrap around”. This even sheds some light on the topic of the average Presidential birthday recently addressed by Nate Silver…
— Patrick Surry, Chief Data Scientist at Hopper
Hopper is an award-winning mobile app that uses big data to predict and analyze airfare and accommodations. Hopper provides travelers with the information they need to get the best deals on flights and hotels, and notifies them when prices are at their predicted lowest points.
We’re hiring for data science and engineering! Click on our Jobs page to apply and for more information.