Visions of Machine Learning at Qchain (Without the Buzzwords)

It is almost impossible to talk about machine learning without buzzwords such as “deep learning” and “neural networks.” (“Machine learning” itself can be considered a buzzword). Rather, what I mean is I will not simply follow the formula of making grand claims — e.g. DeepMind (insert a major research heavy tech company) just beat the game Go (insert an impressive benchmark) so we think the same technology (why don’t we just call it “AI?”) will change digital advertising (insert any industry).

Instead, I will take a bird’s eye view of current machine learning models used in adtech and digital advertising and explain what the models actually do. It’s worth noting that as native advertising becomes the norm, we will see limitations of these models. I will then postulate how we can address these limitations with state-of-the-art machine learning and lay out visions of machine learning for native advertising at Qchain.

In most cases that I am aware of, adtech models focus on audience and effect with the goal to optimize effect with the right audience. The effect here could be measured by many different metrics: impressions, click through rate, and whatever else AdSense dashboard might show. The fundamental problem is that an advertiser has constrained resources and cannot show their ads in front of every person all the time. Therefore, they need to be selective about the audience, to whom the ads are shown.

We can think of adtech machine learning as a constrained optimization problem, which, in essence, is every machine learning problem. Since AlphaGo is also a constrained optimization model, where the constraints are the rules of the game and memory, we think it will be able to solve digital advertising. But of course not. It is because we cannot simulate all possible scenarios for an ad campaign like a game of Go under a set of explicit rules. (Wait… Didn’t Elon Musk say we live in simulation?)

But back to the question at hand: Who in the audience should an advertiser target? Suppose we have data about internet users collected from browsers and other sources that describe their characteristics. Let’s call the data X. We can analyze the distribution of these characteristics, P(X), and find subgroups of users defined by certain characteristics. Knowing these subgroups, we then can target some over others. This kind of machine learning is called clustering, under unsupervised learning. You can use something simple as k-means or potentially a deep neural network auto-encoder. Regardless of the specific model, it will essentially perform audience segmentation.

The elephant in the room is that these modeling approaches do not really look at the *content* of advertisements.

Moving beyond only looking at X, you may also have data about the effect, which we will call Y. Then you can build predictive models for Y using X, modeling the conditional distribution P(Y|X). Again, there are a lot of ways of doing this. You can assume all the data are independent and build a model, then deploy it. Alternatively, you can treat Y as time series (since the data are presumably collected over time) and try to capture patterns by day, week, and so on. You can even update your model continuously as new data arrive, which is called “online learning.” The possible models range from regression models (multinomial logistic regression does multi-label classification) to deep Gaussian processes (a hybrid deep neural network that Ferrari Formula One team seems to use).

Lastly, there is a class of models that actively optimize Y, such as multi-armed bandit. Here we need to extend our definition of X beyond audience characteristics. Imagine an ad campaign as a black box with knobs. Some of the knobs are related to audience characteristics, while other knobs may change the ad format (e.g. size of banner) or distribution (e.g. time of email). The models we have described thus far use data to study how the knobs change the black box output, whereas the multi armed bandit algorithm literally tunes the knobs in two phases. In the first exploration phase, the algorithm tries many different combinations of knob settings. Then, based on the outcome of these explorations, the algorithm will “probabilistically” find (extrapolate with some uncertainty) the most effective knob settings in the exploitation phase.

We can build image and text models to analyze the content of native ads and the content of publishers at scale, in order to find the best “fit.”

Aside from limitations of specific models, the elephant in the room is that these modeling approaches do not really look at the content of advertisements. This is fine for current digital advertising because as you can tell (or as you do not remember), most ads (excluding videos) are quite simple and similar. However, when it comes to native, ads are much richer and more diverse in content. If we think about text in traditional ads, they are essentially headlines or short pieces of copy. On the other hand, a typical sponsored article on the New York Times’ website has significantly more content. We believe in order to generate insights about native advertising, the models (and the people) must give more attention to the ad content — whether it’s an interactive article on immigration, a reported feature on paid escorts, a personality quiz based on a TV show, or another format.

Each dot represents an online article.

This is exactly where state-of-the-art machine learning comes in. (Cue background crowd cheering, ayyyye-eye.) The real progress in machine learning is that, beyond quantitative and categorical data, we can now build models for images and text. These models can recognize objects and process language at the level of an average six-year-old human (yes, “human level”). Major tech companies — many of which are in advertising too — already leverage these models: think Facebook friend-tagging and Amazon answers. We believe there are opportunities to use these models for native advertising.

More specifically, we can build image and text models to analyze the content of native ads and the content of publishers at scale, in order to find the best “fit.” In addition, we want to build these models in an open and interpretable way, rather than simply using a catchy name that includes “AI.” In doing so, Qchain hopes to help both advertisers and publishers meet their goals more efficiently with the right audiences — forging a path for more authentic marketing.