Predicting Sparse Down-Funnel Events in Home Shopping with Transfer and Multi-target Learning

Sangdi Lin
Zillow Tech Hub
Published in
8 min readApr 16, 2020

Over the past two years, Zillow has been transforming from a website where people search and dream about homes, to an end-to-end platform that guides customers through the often stressful home buying and selling processes. Figure 1 gives a simplified view of a real estate customer funnel. To guide this process we model and optimize the user experience at various stages of the conversion funnel. We do this by learning from both noisy signals, like views of a home details page, as well as more direct signals, a customer contacting an agent.

Modeling and learning from down-funnel events can be difficult due to signal sparsity when tackled with common approaches. In this blog we will explore how to overcome these challenges with multi-target learning and transfer learning techniques.

Figure 1: A simplified view of a customer funnel in real estate shopping

Events in Different Home Shopping Stages

There are 3 main types of user events: page views, home saves and contact requests, referred to as “view”, “save” and “contact” respectively. A view event is triggered when a user opens a Home Details Page as shown in Figure 2. Customers can save a home by clicking the heart icon in the upper right corner so that they will receive instant updates about that home, whenever there is a price or status change. Zillow customers can also contact an agent through Zillow when they are ready by clicking the “Contact Agent” button.

Figure 2: Examples of how view, save, and contact events are triggered in a mock mobile Home Details Page

Although these three events are often connected, they each indicate a different point in a customer’s progression in the home shopping journey (see Figure 1). During the exploration stage, customers may visit Zillow for various reasons, such as researching the market, checking out nearby home prices when visiting a new neighborhood, or just to do a bit of “window shopping”. When customers become more serious about buying a home, they start to save homes, share homes with their shopping partners, or contact an agent. However, we’ve found that even though the most expensive homes often receive the most number of views, that number of views doesn’t necessarily translate to the number of serious buyers on the market. Table 1 provides an example where a more expensive listing attracts significantly more views but fewer saves compared to a more affordable listing closer to the local median home price in the same market.

Table 1: A more expensive listing (listing 1) attracts significantly more views but fewer saves compared to a more affordable listing (listing 2) closer to the local median home price in the same market

As we go further down the funnel, signals become fewer and far between. For example, in one US market, out of every 100 homes viewed on a given day, 32 homes are saved, and of those views and saves only 4 homes may drive a connection. The sparsity of signals imposes challenges in accurately predicting these events and can result in highly unbalanced training data. The upper-funnel events such as views and saves have the advantage of availability and therefore are easier to model, but they might not always be leading indicators of business objectives. Therefore, we would like to transfer the relevant signals and knowledge learned from the more available view and save events to the prediction task of contact events. Naturally, this can be an area where transfer learning and multi-target learning can be effectively applied.

Problem Definition

Let’s consider a prediction problem that predicts the likelihood of a listing to drive a contact request tomorrow, given the listing content features and historical engagement (the number of views, saves and contacts a listing has received on the previous days), as depicted in Figure 3. This prediction allows us to know ahead of time how popular a listing is going to be on the next day, and can be used as an input to our recommender system, or used to determine a default sort order for new users (i.e. cold start).

Figure 3: Overview of the prediction task and the model structure

Transfer Learning

Instead of directly modeling the contact event, we first train neural network models to predict targets related to denser signals such as views and saves. Figure 4 describes the steps for the transfer learning. More specifically, we define new binary targets, y_view and y_save, according to whether or not the listing receives more views (or saves) than the average case in its city region: +1 for more, 0 for the same or fewer. After the models for target y_view and y_save are trained, we freeze the weights of the initial layers of the neural network, which forces the network to utilize the features (hidden layers) learned from the dense signals (Left Figure 4). Then only the last few layers are re-trained with the contact target y_contact (Right Figure 4).

Figure 4: Transfer learning: from views or saves to contacts. The input layer consists of listing features and historical engagement features, as in Figure 3.

Multi-target Learning

Another way to leverage the information from all of the relevant signals is multi-target learning. Here we experiment with a simple strategy for multi-target learning which trains a neural network to predict multiple targets at the same time (see Figure 5). The common input is fed through a few shared network layers. On top of the shared network, we construct three task-specific tower networks, each for predicting y_view, y_save and y_contact respectively. This simple strategy is referred to as “hard parameter sharing” in [1]. We train such a network with a joint loss function:

where each component loss is a cross-entropy loss for its corresponding binary classification task.

Figure 5: illustration of multi-target learning. The input layer consists of listing features and historical engagement features, as in Figure 3.

Multi-gate Mixture-of-Expert (MMoE)

Multi-target learning is often sensitive to the relationship between tasks. Forcing different tasks to have the same shared layers makes strong assumptions about the signals that matter to each task and may result in unwanted negative effects. Ma et al. [2] proposed the Multi-gate Mixture-of-Experts (MMoE), an advanced multi-target learning model that better handles the relationship of different tasks. As shown in figure. 6, the model first replaces the common shared layers with multiple expert networks, then for each task it uses a gating network with a softmax activation to combine the contribution of each expert in an ensemble.

Figure 6: The network structure for MMoE. For visual simplicity, similar gating networks (in the middle) for the view and save targets aren’t shown in the figure. The input layer consists of listing features and historical engagement features, as in Figure 3.

Experimental Results

In the offline evaluation, we train the models with training data collected prior to a certain date and evaluate the model on the events collected on that date. We consider the following evaluation metrics from two different aspects:

  1. Due to highly unbalanced class labels, we use AUC to evaluate the accuracy of the classification model. Table 2 contains the results.
  2. We use the NDCG metric to evaluate the offline performance of a default ranking system which ranks listings by the predicted probability of a contact event in decreasing order: The higher the NDCG metric, the higher the relevance of the ranked items. We define a binary relevance based on whether an item is viewed or contacted during a search sesion. That leads to a contact-based NDCG metric and a view-based NDCG metric to measure the relevance at the top and the bottom of the customer funnel, respectively. Figure 7 reports the percentage lift relative to the baseline model at different positions.

We compare these models against a baseline model that directly trains a single-target neural network to predict the binary contact target. Due to stochasticity, each model is trained and evaluated 10 times, and we summarize and compare the model performance based on the mean of each metric.

Table 2: Comparison of the different models on the AUC metric

Based on the AUC metric reported in Table 2, both the transfer learning and multi-target learning (naive or complex) significantly improve the prediction accuracy for the contact target, relative to the baseline. MMoE and the naive multi-target learning methods display a very similar performance. This could be due to the three tasks being closely related, and in this case the simpler shared network works as well as the more complex architecture in MMoE. The results demonstrate the usefulness of leveraging signals from the denser upper-funnel events in predicting the sparse down-funnel events.

One interesting finding is that the transfer learning model that transfers the save signal achieves the highest AUC, which shows that save events contain very relevant information predictive of contact events. This could be explained by save events being more available than contact events, and also that they contain relatively strong user intent in home buying as mid-funnel events.

Figure 7: Comparison of different models on the contact-based NDCG metric (left) and view-based NDCG metric (right)

Similar conclusions can be drawn from the comparison of the NDCG metrics as shown in Figure 7. Transfer learning from the save target and the two multi-target models all demonstrate similar improvements over the baseline for the contact-based NDCG metric, with only a slight advantage for the MMoE model in terms of the view-based NDCG metric. On the other hand, transfer learning from the view target doesn’t provide as strong lift as the other methods, which could be due to view events being noisier and further away from the contact events in the customer funnel.

Conclusion

In this blog post, we covered how we used transfer learning and multi-target learning to improve the prediction of the sparse down-funnel events at Zillow. Although our discussion centers around a simple task of predicting likelihood of contact events, these techniques can be applied to recommender system problems between users and items to drive valuable down-funnel engagement. Signal sparsity exhibited in down-funnel events is a common challenge across various domains such as e-commerce, search engines and advertising, therefore we hope our learning can be useful for many other data science practitioners. Given the strong experiment results, we will continue exploring potential applications of multi-target learning and transfer learning techniques in the personalization domain in the future.

If you find this work interesting and if you would like to apply your data science and machine learning skills to our large-scale, rich and continuously evolving real-estate data, please reach out.

References

[1] Ruder, Sebastian. “An overview of multi-task learning in deep neural networks.” arXiv preprint arXiv:1706.05098 (2017).

[2] Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, Ed Chi, “Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts”, KDD 2018.

--

--