AI in travel, part 3: representing preference distributions

Hopper
Life at Hopper
Published in
5 min readJun 4, 2018

The idea of measuring distance between different values of a cyclic feature suggests the concept of a preference distribution over such a feature. In figure 4, imagine that the user’s preferred origin is p and that q is the origin of a trip they’re considering. (Obviously the likelihood of switching origin from New York to Paris is usually low!) Now, just like the dot product of a full user preference vector and product feature vector produces a similarity score, the dot product p · q produces an origin similarity score, rising as the two origins get closer together. If we imagine q varying over all possible origins, we end up with a distribution of relative preferences over the globe, falling off as q gets farther from p.

Next, consider relaxing the condition that p be a unit vector. As with homogenous coordinates, all vectors in the same direction will encode the same value of our underlying feature, but we’ll use the extra degree of freedom given by the vector’s length |p| to measure the “strength of preference” for that value. This gives us a simple, symmetric and flexible representation of a probability distribution over our feature.

In a typical learning algorithm, the components of p will be part of a full user vector u, for example u = (px, py, pz, u4, u5, …) and the components of q will similarly be part of the product vector v. So the total score u · v = p · q + u* · v* is just the sum of the origin score plus similarity scores due to other features. Varying |p| changes the importance of the origin feature relative to the others in a model. With a non-linear link between score and modeled probability, such as a logit or sigmoid activation function, changes to |p| will also affect the ‘width’ of the resulting distribution.

As a concrete example, consider the geographical case in Figure 4 and assume that our preference score p · q models the log of the probability that the user will choose origin q (or equivalently a logit with low scores). That is, the probability P{q} = c e p · q = c’ e |p| cos α, since q is a unit vector. At least for small values of α, where cos α = 1 — α2 + O(α4), figure 6 confirms that this looks a lot like a Gaussian distribution with variance |p|-1. That is, |p| = 0 indicates a user with no origin preference (infinite variance), with increasing relative preference for p as |p| → ∞.

Instead of using only points on the unit sphere to represent places on earth, we interpret an arbitrary vector (x, y, z) as representing the place where that direction intersects the sphere, with the length of the vector |(x, y, z)| indicating strength of preference. The point (0, 0, 0) represents indifference (no preference), with longer vectors indicating increasing preference for the place represented by that direction.

We can apply exactly the same idea to our time of day example. The figure below show how a distribution of time-of-day preferences shifts as we increase our preference for 8am.

All vectors at 120° past midnight indicate a preferred time of 8am. These vectors have the Euclidean form (½√3 c, -½ c) where larger c values indicate stronger preferences. For c = 0 (left) the user is indifferent, and all times of day have equal scores. At c = 1 (middle) the probability distribution has shifted towards 8am, with the shaded area in any sector from the origin showing the relative preference for the corresponding time interval. As c increases further (right) the probability density becomes increasingly concentrated around 8am.

Preference distributions for non-cyclic features

As a final note, we can even use these apply these ideas to non-cyclic features. For example, consider price. Normally we’d model a simple linear relationship with price (or some transformation of price) using a single weight. But it’s unclear that a monotonic relationship with price always makes sense when modeling user travel preferences. Certainly users prefer lower prices, all other things being equal, but often users are seeking to find the best trip they can get for some target budget. Other examples of continuous features where users often prefer an intermediate rather than extreme value include: length of stay, layover time, and flight duration or distance.

A traditional approach in this case would be to bucket the feature and dummy code each bucket, but that introduces a new parameter for each bucket, requires us to select the break points between buckets, and loses any notion of “ordering” between buckets.

An alternative approach is to pretend that price is a cyclic feature, but map onto only half of the unit circle — say the 00–12 range of our 24 hour clock in Figure 2. Then we can represent the feature using (x, y) coordinates on the unit circle, with all values encoded as points with x > 0. The figure belowshows an example of encoding values of clamped log price. Now we use two parameters to model price preference in our dot-product formulation, just like with a cyclic feature, and can capture both a preferred price point as well as a preference strength, i.e. a distribution around that preference. In fact, by restricting ourselves to the half circle, we can even interpret vectors pointing in the negative x direction as “anti-preferences”! That is to say, a vector (x, y) where x < 0 indicates that our least preferred value is the one encoded by (-x, -y).

Encoding user price preference using a half-cyclic representation. We first transform price by clipping between $10 and $2000, and taking the log. Next we map the prices considered by a user to half the circumference of a unit circle (shown as blue rays). This particular user has a strong preference (red vector, with length nearly 1) for prices near $285 (red vector), which is slightly below average and much less spread out than the price distribution across all users (shaded grey region).

If you missed Part 2 of our series, you can find it here.

Patrick Surry, Chief Data Scientist at Hopper

Hopper is an award-winning mobile app that uses big data to predict and analyze airfare and accommodations. Hopper provides travelers with the information they need to get the best deals on flights and hotels, and notifies them when prices are at their predicted lowest points.

We’re hiring for data science and engineering! Click on our Jobs page to apply and for more information.

--

--

Hopper
Life at Hopper

Hopper uses big data to predict when you should book your flights & hotels. We’ll instantly notify you when prices drop so you can book travel fast in the app.