If there is an ad display banner on a website and no-one sees it, was there really an ad display on that website¹? From its earliest origins, the main heuristic in advertising for deciding where to place an advertisement, be it a billboard on a busy road or a short jingle in the middle of a popular tv-show, has been to maximise the visibility of the advertisement. In times when it was very difficult to measure and quantify their incremental effectiveness, one thing was still certain, maximising visibility is imperative to the success of the ad campaign. The measurement uncertainty in the early days around the effectiveness of advertising choices is aptly summed up in the famous quote below,
"Half the money I spend on advertising is wasted; the trouble is I don’t know which half"
John Wanamaker (1838–1922)
In today’s age of targeted advertising² with its fine-tuned campaigns and real-time feedback loops, we are closer to answering this question than we have ever been. Yet, as reported in a recent Google infographic³ and other viewability research⁴, upto 56% of all displays made on websites aren't viewed at all. At Criteo, a large majority of clicks (~96%) are on viewed displays (the rest 4% are either mis-clicks or displays visible at less than 50%). Consequently, measuring display viewability brings us all the more closer to resolving the dilemma of wasted advertising spend. It is clear that the promise of the effectiveness of an online advertising campaign rests on a barely acknowledged premise, which is that the display impression be visible to the user.
A display being viewed is the precursor to any meaningful interaction with the brand and the product being advertised, as a display never visible to the user cannot be clicked (discounting misclicks) or acquaint a user with a brand. Hence it is in the interest of advertisers and their targeting partners to invest in a model that maximizes the viewability of a display at bidding time and ensures that the budget is directed towards viewed displays.
As recent advances in the technology of detecting the viewability of displays in the browser and separating non-tracked and non-viewed displays finally opened the possibility to develop highly accurate viewability models and parallel developments in the world of online advertising towards a privacy oriented future⁵, it was clear that the viewability model should form part of our log standing strategy geared towards the future that is user privacy oriented advertising⁶. An in-house viewability model that only uses publisher, partner and banner level features fits neatly into this while also pushing Criteo towards a full-funnel DSP future with better controls on view-rates to our clients.
Tracking and Measuring Viewability
The most crucial aspect of any model is the input data availability and reliability, following the adage “Garbage in, Garbage out”. There are several ways to quantify the visibility of a banner to the user, including measures of duration, the extent of the banner viewed, or a combination of those. In 2014, the Interactive Advertising Bureau and the Media Ratings Council, two organisations that set the standards for the international advertising industry laid out the official standard for a viewed banner as one that was visible on-screen at least at 50% for more than 1 second. For video displays, the accepted standard for a viewed display being at least 50% of the display being visible for at least 2 seconds.
Each major browser has its own support for measuring viewability as there is no standard solution that works everywhere. At Criteo, we use several different tracking methods depending on the browser to track viewability, the most common being IntersectionObserver Viewability (which watches for an intersection between two defined elements, in our case being the browser viewport and the outerframe of the banner). At Criteo, we developed other methods to measure viewabality on legacy browsers where the IntersectionObserver API is not available. For example, on some browsers, we infer the viewability of the ad from the frame-rate variations (when the ad is hidden, the framerate is lower). Other methods include BrowserOptimization viewability (which uses inherent browser properties that look to different frame-rates that the browser is performing at and uses those frame-rates to determine whether the ad is in view). On other browsers, we call ElementFromPoint viewability (which returns the elements present at some specified coordinates with respect to the viewport, and returns null if the specified point is outside the visible bounds). You can go into more technical details of tracking methods here⁷.
The measurement of viewability is defined w.r.t. the accepted definition of a viewed banner, which quantifies how much of the banner needs to be visible and for what duration to be considered viewed. We work with different solutions to define the accepted measures for different types of banners. The implementation of TRACKED_VIEW allows us to know if we are able to track the viewability for a display, while others offer a more precise view, of the duration and minimum extent to which the banner was viewed.
Criteo uses several types of ad formats for advertising, including HTML Standard, Video or Native banner types. An overwhelming (81%) proportion of our displays are in the HTML Standard format and we are able to measure viewability on 95% of them. In fact, we measure viewability on all types of displays except Native displays, the measurement capability for which currently doesn’t exist. Native displays are ads that blend into the page content and style, and therefore often look very much like an integral part of the webpage, for example in social media feeds, or as recommended content on a web page. Video displays, on the other hand, use a different type of measurement method, and since they represent only 0.5% of our displays, in order to avoid unnecessary complexity in our first viewability model, we decided to exclude these displays for this iteration.
Both TRACKED_VIEW and ONE_SECOND_50_PERCENT_VIEW together allow us to determine whether a display was tracked and viewed, tracked and non-viewed or not tracked at all. This ability to differentiate between tracked but not-viewed and untracked will prove crucial to our model’s ability to predict with reasonable accuracy the viewability of a display and thereby better determine the worth of the display (used to determine the bid price) at auction time. Also, while tracked displays can be used to train the model, untracked displays would need to be dealt with separately. This would entirely depend on the generalisation ability of the model trained on tracked displays to untracked displays. And since we do not get any feedback for the accuracy of the predictions for untracked displays, we would need to devise different scenarios and sanity checks to determine their validity.
Now that we have laid the groundwork for viewability as a model and its measurement in the browser, let us go deeper into the model building process, which begins with the selection of predictive features that hold information on the visibility of a display to a visiting user on a webpage. We can expect the visibility of a given display to be largely dependent on the publisher (including the vertical, engagement of the content, probability of scrolling down, etc), the banner itself (size, type, position on the page) and even the device and browser type (in-app mobile vs desktop). Going further, user-level features when combined with publisher level features can provide very interesting interactions between user and publisher (C. Wang et.al.⁸). For instance, different types of users are likely to behave differently depending on whether they have previously made purchases through ads on the publisher, or even scroll down if the content on the webpage corresponds to their interests. When evaluating the mutual information gain of these user features for viewability, we did not see any strong signal especially compared to publisher/banner and device level features. Combined with our mission on moving towards user privacy preserving models, we decided to not experiment further with user interactions with other features like publisher/device and limit ourselves to banner, device, and publisher level features and leave out user-level features so as to keep the model contextually focussed. To further reduce the number of features, we used univariate feature selection to weed out low-performing features to get the final shortlist of 40 features which include hashed high cardinality features like the id of the publisher or the truncated URLs.
To confirm the generalisability of the model trained on tracked displays to untracked displays, we need to ensure that the input data distribution is comparable across features and modalities for these two types of displays, for example, there being no missing features or modalities between the two. This can be checked easily by either training a model to predict whether a display is trackable, or more simply look at the more information-rich features which allow us to determine the trackability of displays. If there is a stark difference in the distribution of input features with respect to tracked or untracked displays, we would have high predictability and the most predictive features in the model would point us to the culprit features. Looking at the information gain ratio between the features and the label is_tracked, we realised that being Native/Video was highly predictive of the trackability of displays as expected followed by the size of the display. This is expected since while more than 95% of other banner types are tracked, none of Native banners are tracked and Video banners do not have the same definition of being viewed. Going further, we see that there is no overlap between display sizes for Native/Video vs other displays, therefore we would indeed have a generalisability problem for Native/Video displays. This was mainly because in our dataset, the display sizes for Natives and Video are placeholders and do not refer to actual display sizes, while display sizes for other display types were indeed the actual size of the banner.
Removing Native/Video and re-evaluating the mutual information gain ratio for tracked/untracked displays left us with very low information gain ratio features, allowing us thereby to assume generalisability for the remaining untracked displays. As for dealing with the Native/Video traffic at inference time, We decided to not use the viewability prediction in our bidding models on that part of the traffic. This makes sense since all native displays are untracked and video displays use a different measurement standard, both of which dash hopes for generalisability given the unique behaviour of these displays as compared to other display types. Removing them from our model leaves us with untracked displays from HTML which earlier made up 20% of all untracked displays and which now make up 99.9% of untracked displays in the training dataset.
The first results for the viewability model came out to be promising, with the distribution of predictions over untracked displays closely following those for tracked displays even if shifted to slightly lower values, the mean prediction being 0.95 times that for tracked displays.
Further sanity checks confirmed that our model was better at predicting the viewability of displays than Google provided historical viewability. We are also able to separate viewable and non-viewable displays for every Google viewability bucket. From the second figure, we see that the average predicted viewability for untracked displays follows the average predicted viewability for tracked thereby confirming the generalisability over untracked displays (after excluding Native/Video displays).
We tested several combinations of hyper-parameters using a randomised grid search including the number of days of data on which the model is trained, regularisation, and different samplings for the majority class label. We tested including the prediction of viewability (pView)in our models in primarily two different ways, as one of the features or as a factor that is multiplied with the model that predicts landing/click probabilities given a viewed display. The major difference between the two strategies lies in the different responses to the generalisability problem of untracked displays. While in the pview-as-a-feature scenario, we can just mask the viewability in case the model is not generalisable to Native/Video displays, in the case of pview-as-a-model, we will need to work with an entirely different model for the non-generalisable displays. We tested out several variations in the pView model for both strategies on part of our business and found conclusive evidence for the better performance of pview-as-a-feature than as a model, with uplifts of 0.6% for Tracked displays and ~1% for untracked displays (uplifts are measured in log likelihood and are with respect to the model in production) for the best case scenario which included modifications like custom bucketisation of pview and masking over Native/Video and Facebook displays.
With solid offline uplifts, we moved to online testing viewability as a feature in our business models. Online testing changes in our models involves testing out the changes in our models on a sizeable test population and comparing the reaction of users to this test vs the reference population via carefully chosen metrics. A test model is only released in production if we see a statistically significant uplift. Statistical significance⁹ allows us to separate a random fluctuation in metrics from a difference arising out of the modifications we are testing.
We AB-tested viewability as a feature on most of our traffic and observed statistically significant uplifts over the long and short term across all of them. Subsequently, we have rolled out our in-house viewability model as a feature across several of our business models used in bidding from Prospecting to Retargeting.
Major insights from the AB-tests
While our overall uplift across different scopes varied from 1–2% uplift in long term revenue uplift and 3 pp increase in view-rate globally, there were sub-scopes which showed impressive uplifts, and which solidified our confidence in the viewability model bringing value exactly where we expected it to. We also observed an increase in spend overall given that we are now buying higher-quality (more visible) displays. The most interesting of these sub-scopes were:
- Web displays major contributor to uplift: Web Displays which usually have a much lower viewability than in-app displays are the source of the lion’s share of the observed view-rate uplift, with up to 5 pp uplift in web view-rate. As a consequence, web displays also account for the major share of the revenue uplift resulting from viewability as a feature.
- Below the fold displays show the largest view-rate improvement: We also receive information on the positioning of the displays with respect to the user’s screen. Displays above the fold are visible when the page loads, while displays below the fold only become visible when the user scrolls down. As expected, displays above the fold are more viewable and more costly, therefore if one could infer the viewability of a below the folding display given the publisher, one could get the same value for a lesser cost. After including the viewability prediction as a feature in bidding models, we see that below the fold displays account for the majority of the view-rate and revenue uplift. This also means that we are buying below the fold inventory that is more likely to be seen by the user, hence improving quality and value.
- Displays without any pre-computed viewability show the biggest improvement (Google/AppNexus historical viewability): Given that we also use viewability info given by some SSPs, we looked at the relative uplifts coming from displays which do include this information vs displays without any viewability information so as to ensure the model was not too heavily relying on these features when predicting viewability. Looking at the uplifts coming from displays with and without this information, we observe the largest uplift in view-rates comes from displays without any pre-computed historic viewability with the relative uplifts in revenue being twice as high as for displays with this information. This happens because the displays without historical viewability information benefit a lot more from this additional information as compared to those which already had a historical proxy on which the model could learn.
- We perform better than Google/AppNexus provided historical viewability: We observed this trend in offline tests as well, with the viewability model being able to separate viewed from unviewed displays across all ranges of Google historic viewability. In the online test population, we observe uplifts in view-rate on displays across all ranges of viewability provided by Google RTB, with the largest uplifts coming from displays which were classified by GoogleRTB as 0 or unknown viewability, resulting in view-rate increases in the range of 10 to 13 pp and revenue uplifts ranging from 4 to 30%. At the same time, we see a down-lift in the number of displays of up to 25% on these buckets while seeing an increase in spend as we buy the more valuable displays of these and offset this cost with an uplift in advertiser value. We also see a corresponding increase in buying of higher viewability displays.
- Uplifts on Untracked Displays: We observed an uplift in revenue on untracked displays as well to the order of 1% which correspond perfectly to our offline observations and attest to the model generalising well to these displays.
- Uplifts on Masked Displays: Interestingly, we also observed positive uplifts in revenue for Native and Video displays for which the predictions of viewability were not used to learn the model. This could be due to the model being able to infer the landing rates for these displays better because of an overall better model with more optimal weights even if the new feature is masked for these displays.
- Most beneficial for inventory with sparse user information and prospecting campaigns: Inventory where user historical data (including degree of interest in given partner and other user behavioural features) is most lacking, are the ones that derive the largest uplifts from the viewability feature. This makes sense as this is also the inventory with the largest potential for improvement due to sparsity of user-historical data. This brings us to the first point we made in the article about the future of advertising being privacy-oriented and contextual, and this is exactly where viewability seems to provide the most value uplift.
- Increase in average view-time: Since we are able to measure the time for which a display was viewed at 100%, we can measure if the addition of viewability information of displays results in increased view-times. We saw a 3–5% down-lift in buying of displays that are never viewed completely (100% of the banner in view) and a corresponding increase in buying of fully viewable displays.
Viewability for Video
While we decided to exclude video displays from the above model given its specificities in terms of measurement methods, another enterprising team decided to go ahead with a view prediction model designed solely for video awareness campaigns to fulfil clients needs. These campaigns solved a business need for our clients who wanted to focus on brand awareness rather than clicks or conversions. Given that one cannot quantify the completeness (or extent) of exposure to video using seconds of time viewed given the varying durations of completion time of the video, they decided to implement relative measures which allowed them to determine the number of quarters of the video that had been viewed i.e. the video was played and was concurrently viewed by the user, all the way to determine whether the video was completely viewed. Thus a complete video view here means that the video was in the visible part of the browser i.e. the viewport, therefore visible to the user, while at the same time being played to the end. This involved implemented of custom JS code for videos with VPAID enabled trackers which constitute about 70–80% of all our video displays. This in turn enabled them to develop a model that predicted the probability of fully viewed completion of a video based on contextual (publisher/partner/device) and video level features. After successful offline testing, the AB-tests came back solidly positive with a doubling of the CTR and 35–40% increase in video completion rates while in view, resulting in the addition of another much-awaited functionality in our bidding arsenal.
The current model for viewability was meant to provide a first boost to the bidding models as well as develop our contextual offering. However, the above model is only the beginning of continuous analysis and improvement of our contextual offer, of which viewability prediction forms a crucial component. Some of the most interesting and relevant projects we’re already working on are:
Apart from measuring and updating binary measurements like 1 sec at 50% view, we also measure the time for which the display was viewed at 100%. This measurement gives us a more accurate measure of the interest and attention of the user as well as the likeliness that the user actually saw the add and the time for which they were exposed to it, offering promising improvement potential to the viewability model. Subsequent analysis showed that viewed time was highly correlated with CTR and sales, and therefore deserved to be given attention and tested in our models. Displays viewed at 1 second for 50% and those viewed at 5 seconds at 100% provide very different exposure and resulting click probabilities. We are currently in the process of testing different strategies to include this information in our models, including combining it with the already existing viewability model via loss weighting or crossing, not withholding trying different strategies to represent this information (discretisation/custom bucketisation/masking, etc). Work already done in this direction shows promising results with up to 8% uplift in prediction accuracy for non-viewable displays.
Being able to predict the viewability of a display before bidding on a slot opens up several new possibilities in terms of offerings to our clients. One of these is to allow them to set thresholds and further fine-tune the minimum view-rates for their campaigns. This also marks another step in Criteo’s development towards a full-funnel DSP which will allow clients to set their own criteria and levers in accordance with their specific needs.
Adding other information-rich features
Another future development would be to create an in-house historic viewability measure, which includes information on the historic view-rate on publishers and ad-spots. We are currently working on developing this measure for our inventory. Another source of information rich features for the viewability model related to the type of content that is being viewed, relevant in terms of its engagement, category, and context. We are currently testing several methods including using contextual embeddings to integrate these features into our models.
If you enjoyed reading this article and would like to go into further details into interesting research and development projects going on at Criteo, follow us at https://ailab.criteo.com/
Thanks for reading! If you are interested in our latest publications on the Criteo blog, check this out:
Under the hood of Spark performance, or why query compilation matters
Criteo is a data-driven company. Every day we digest dozens of terabytes of new data to train recommendation models…
Interested in joining the crowd?