Robustness for User Response Prediction
User response prediction is a central problem in the computational advertising domain. Quantifying user intent allows advertisers to target ads towards the right users. This leads to a judicious use of marketing dollars and also renders a pleasant user experience.
Existing classifiers like logistic regression and factorization machines, which have seen widespread adoption for response prediction problem, assume the user signals to be the absolute truth.
In this article, we describe the pitfalls of such an approach and advocate the need for classifiers which model the inherent noise and uncertainty in user signals and perform gracefully even in the worst case scenario.
Our work titled Robust Factorization Machines for User Response Prediction accepted in WWW’18 conference is an attempt to treat data uncertainty as a first class citizen in the classification setting.
To explore robust factorization machines at a deeper level, refer this blog.
Understanding the Advertising Ecosystem
Users typically interact with advertiser’s app or website, perform some actions like item views, add-to-carts and might navigate away without making a purchase.
In order to re-engage the users, advertisers bid for the users on the open web in order to show them a personalized ad. This bid is computed as a function of user propensity to click or convert given an ad impression.
User response prediction, the umbrella term for conversion or click prediction is generally formulated as a binary classification task given the user site activity signals and the associated context.
Conversion prediction (CVR modeling): Whether user will purchase if shown an ad impression?
Click Prediction (CTR modeling) : Will the user click on the ad impression?
Logistic regression(LR) has been the preferred classification algorithm for response prediction owing to the fact that it is scalable and yields interpretable models. The downside of using LR is that the effect of feature interactions is not captured. For example if the ‘user device = mobile’ and ‘category = clothing’ is a strong indicator for purchase, the LR model will not be able to capture this association in its feature weights.
Factorization machines(FMs), proposed by Steffen Rendle, allow for feature interactions to be captured in a latent space. That is, for every feature a p-dimensional vector is learnt and the similarity between two features is given by the dot product of these latent vectors.
Cookies and Device ids are the main identifiers through which an advertiser can access the previous user activity on the site.
In an ideal setting, an advertiser has a complete view of user activity for generating the purchase/click probability. However, users visit an advertiser’s site through multiple touch points. And different avatars of the user might have different browsing patterns.
For example, the same user A can visit the advertiser app through mobile, and sometime later view other products on desktop. For the advertiser though, there are two partial views A1 and A2 for this same user. On mobile, the advertiser may see a bursty browsing pattern indicating a casual browser, whereas the same user might seem to be an avid shopper on desktop.
Additional noise inducing factors that lead to a corrupted user view at the advertiser’s end are:
- High cookie churn rate.
- Variable network connection speeds.
-Operating system nuances .
Advertiser has a fragmented user view owing to factors described above. While bidding, the advertiser will use the signals from only one of these partial views of the user to compute the response probability. Had the advertiser known the consolidated user view, the response prediction would have been more accurate.
How serious is the problem?
A study by Criteo highlights that nearly 31% of online transactions involve two or more devices and that buyer journey and conversion rates increase by ~40% in a User centric view as compared to a partial Device centric view.
Hence it is pivotal to model this potential incompleteness in the user signals available to the advertiser. However, the existing algorithms used for response predictions assume the user signals to be precisely known and are sensitive to any perturbation in the input signals.
Since complete user profile consolidation remains an open problem, the classifiers will have to step up and model the data uncertainty.
Robust Factorization machines (RFM) and Robust Field Aware Factorization Machines (RFFM) proposed recently in WWW 2018, model the data uncertainty using principles of robust optimization. The overall idea is to learn a classifier which exhibits noise resilience by minimizing the worst case loss. Checkout our blog for an intuitive understanding of robust factorization machines.
In the end..
Robustness is a desirable property. Not just for the computational advertising domain, where the presence of multiple touch points make the noise resilience imperative, but also in any noise-sensitive domain.
RFM and RFFM are generic predictors which can be used for any classification task.
Robust classifiers take a rather conservative view while modeling worst case loss. Can we use some paradigm that leverages the data distribution to learn the underlying uncertainty? Distributional robustness and data driven robust optimization are two interesting directions that can be explored for this.