Impact of Machine Learning Advances on MarTech

Introduction

These are exciting times for Machine Learning and Artificial Intelligence. Over the last few years, there has been a steep increase in the number and types of tasks for which machine learning algorithms and models have matched or bested human level performance. In addition to research and academic achievements, there have been major applications of machine learning in real world settings as well. The combined effect of these is a forecast of an AI economy in the next few years with great economic potential (see 1, 2, 3 for example). Most of these advances can be traced back to a combination of hardware improvements, proliferation of data and most importantly, the rise of a branch of machine learning called Deep Learning.

In this article, we discuss some of the successes of deep learning in various fields, examine the area of MarTech in its current state and its readiness for deep learning models and technologies to create value. Finally, we discuss the need for a smart, new age predictive Data Management Platform (DMP) that is able to extract the best value from the large, varied and sparse data that characterises the signals available to marketers.

The state of Deep Learning

Deep learning uses ideas from artificial neuronal models to build large and complex networks from simple building blocks (see 4). Figure 1 shows a simplified single unit model. Figure 2 shows a deep neural network of a simple kind (feed-forward) that uses multiple fully connected single units as building blocks. These networks perform very well on multiple learning tasks from a variety of domains — sometimes even reaching or surpassing human level of performance. Compared to traditional neural networks (shallow networks) which are only a couple of layers deep, current models can be 100s of layers deep which help capture fine grained signals and features from the problem domain. The structure of these networks are adapted to the kind (referred to as modality) of the data that it operates on. The most popular models are convolutional neural networks (CNNs) for image data, recurrent neural networks (RNNs) to capture time-series data (videos, speech), and Long Short-Term memory modules (LSTMs) to add in memory and attention modules. In addition, ResNets, HighwayNets and DenseNets allow the training of really deep networks.

Figure 1: A simplified single unit model. The w’s are adjusted (learnt) based on agreement with the actual data.

The most common type of data on which deep learning has been applied is image data. By extension, there are quite a few advancements in deep learning for video data as well. These results are already applied at large scale in the real world in autonomous driving, video/image captioning, satellite imagery processing, screening and disease detection in medical images. Deep learning models have also improved the performance of speech recognition and generation systems. Conversational agents, chatbots, speech interfaces have proliferated in the last few years due these advancements. Another area which has seen a lot of improvement is natural language processing/understanding (NLP/NLU) systems. Machine translation, Entity recognition, sentiment analysis, parsing, etc. are some of the NLP tasks that have benefitted greatly from deep learning pipelines. Aside from these, deep learning has been successfully applied to control and navigation tasks in robotics, automotive and avionics industries, recommender systems, game playing among a host of other areas (see 5).

Figure 2: A feed-forward neural network architecture with multiple hidden layers where each layers transforms the output of the previous layer. A neural network is typically called deep network when there is more than one hidden layer. State of the deep networks can be 100’s of layers deep

One key advantage of using deep networks for machine learning is that the features are extracted automatically from the input data. This not only saves a lot of time which would have been spent in task specific feature engineering but also builds feature representations that are typically much better than hand-crafted features. At 1plusX, we have built such an automated pipeline for representation learning across the various input data types that one encounters typically in AdTech/MarTech. Our proprietary multidimensional 1plusX embedding space (see title image for a three dimensional visualisation)is able to extract and encode useful features from textual content, browsing stream logs, app usage data, ad impression/conversion logs .

The state of MarTech and AdTech

In the marketing and advertising fields, there has been a need for data driven decision making from the beginning and especially since the advent of internet marketing. The rise of Data Management Platforms (DMP) is a step in this direction to derive valuable insights from large volumes of data. However, the job of the insight generation (actual predictive and inference tasks from the collected data) is still performed mostly by human experts. This means that a lot of these experts’ time is spent on sifting through large amounts of data and experimenting with different signals to produce these insights as opposed to choosing the right insight to act upon. This results in loss of resources in conducting expensive A/B tests or long-winded trial and error process. On the positive side, a recent survey by Walker Sands Communications (see 6) has indicated an increased readiness to invest in and apply technology for marketing and advertising decision making. With the cost of integration into newer platforms and services also shrinking considerably over the past few years, more companies are trying out partners or service providers who claim to automatically generate insights from their data streams. However, the techniques used are still rudimentary and usually fail to justify the hefty price tag. The main frustrations with these offerings are of one of two types:

  1. Old school tagging and analytics solutions that rebrand themselves as “smart” and try to ride the AI hype wave
  2. Rehashing existing machine learning solutions and offering stand-alone data science platforms that are then very difficult to integrate with the existing tools that martech and adtech companies use

Deep Learning and MarTech

The nature of the data and prediction problems that are present in advertising and marketing are characterized by the following features

  • Large but sparse data: There is a large amount of data but it is still sparse (few data points per end-user and a very limited view of the users’ activities)

At 1plusX, we use insights from large scale deep representation learning models in order to embed the users and items (topics, content, products, apps, etc.) in our 1plusX embedding space. Using these models, we are able to find meaningful features that allow the interaction of the users and items to be captured in an efficient manner.

  • Multi-task prediction problems: We would like to predict multiple attributes per user from the same input data.

Our models are multi-task capable by design. In fact, we find that having to do well on multiple prediction tasks improves our metrics for individual tasks as well. The representations we learn are fine tuned by error feedback across a number of different prediction tasks with varying feedback volume and frequency. For instance, our user socio-demographics, user interest prediction and item classification modules are co-trained with the same network in order to improve the user and item representations that are learnt.

  • Extreme sparsity of ground truth data: We have little or no labels for training machine learning algorithms. This requires figuring out right proxies to use for end to end solutions.

At 1plusX, we use ideas from distant supervision and proxy learning in order to pre-train our models both for representation learning and for multi-task prediction. We then fine tune our models with actual ground truth data where available. The identification of suitable distant/proxy labels is done through statistical tests that allow us to identify correlations between hard to obtain labels and easily available proxy tasks.

  • Continuous feedback: Ability to handle real-time feedback on the performance of the learning algorithms and adjust the predictions accordingly.

Since our models are trained and fine tuned in an online manner, we are able to easily accommodate continuous feedback and make real time adjustments to our predictions.

In each of these cases, recent research has demonstrated that using an automated feature learning system greatly outperforms solutions built on top of handcrafted features. Deep learning is the solution of choice for learning such robust and resilient features that can learn and generalise from large/sparse data and can generalise across multiple tasks. It is also easy to adapt to continuous feedback.

However, before we rush to implement an expensive and complex deep learning system that is thousands of layers deep, we need to have a few initial systems up and running and feeding into this system. A good way to think about readiness for deep learning is the AI hierarchy-of-needs pyramid (Figure 3 below. See 7 for the original figure and accompanying explanations).As we traverse up the pyramid, we can identify the evolution of online advertising and the accompanying technology stack. We at 1plusX believe that we are in the age of AI enabled marketing/advertising solutions powered by deep learning as long as the lower layers of the pyramid are built in a manner that supports this transformation

Figure 3: Deep Learning hierarchy of needs. At 1plusX, we recognize and acknowledge these requirements and build our infrastructure and models keeping in mind that the machine learning solutions are the icing on the cake and not the cake itself.

Needs of the Industry

Generating value from data in the adtech and martech space requires a thorough understanding of multiple components that have evolved over time. These include but are not limited to user tracking technology, cookie lifetime, cross-device identification, user journey and behaviour models, deep content and action understanding, pricing models, conversion tracking and general adtech stack integrations. In addition to these technical and business requirement, the right technology partner that delivers value to customers is able to:

  • Respect privacy considerations of the end users and legal entities
  • Deliver personalization techniques for both content and ads
  • Allow customers to generate business insights on top of their data
  • Operate in real time and adjust to rapidly evolving user behaviour patterns

These requirements reinforce the earlier point that an off-the-shelf machine learning solution to solve deep problems in this space is bound to fail. On the other hand given the long evolution of the technology stack in this space, it would be difficult for traditional adtech companies to harness the benefits of modern machine learning techniques and easily build it into their stack of solutions. One solution to the apparent chicken and egg problem here is to build a technology company ground-up to have integrated machine learning solutions at all layers of the stack while doing it with a sound understanding of the martech/adtech space. This is what 1plusX attempts to do for this space.

Our team consists of experienced professionals in marketing/advertising technology and also renowned practitioners of machine learning and AI. Our approach to build a modern DMP that leverages data science and machine learning at all stages of the data journey allows us to provide novel value to our customers. Right from data collection, to our own proprietary data lake for exploratory analysis and insight generation, to our own semantic item space for content categorization, to advanced deep learning algorithms for socio-demographic and interest predictions, all the way to our intuitive UI that allows optimization of marketing/advertising spend across multiple attributes — we have data science woven into the fabric of a Data Management Platform (DMP).

Using our expertise in addressing the right problems and the deep machine learning techniques outlined above, we have successfully demonstrated the efficacy of the following features on real customer use cases.

  • socio-demographic label prediction
  • user interest prediction
  • audience expansion and optimisation
  • semantic targeting
  • content recommendation
  • campaign optimisation

Summary

In summary, we at 1plusX believe that there is a real need for deep learning based solutions to some of the problems in AdTech/MarTech. These solutions need not only be limited to the tech giants such as Google, Facebook and Amazon and can be applied at companies that are smaller scale as well. However, the journey to get there is not an easy one but requires careful planning and the right partner who understands the business, technology and the math that would drive the industry in the near future.