Can Machine Learning Help to Predict User Engagement?
By: Nattiya Kanhabua, Senior Data Scientist and Andreas Kaltenbrunner, Director of Data Analytics at NTENT
User engagement or customer engagement is a measurement of a user’s response to your marketing campaigns. Implementing engagement is a key strategy that most leading companies use to delight and acquire/retain mindshare of their users. User engagement strategies vary from product to product. Choosing the right tools will involve more than side-by-side comparisons and depend highly on quality.
At NTENT we use a combination of techniques to increase user engagement. NTENT-powered browsers and search functionality that include content recommendations, sending push notifications, and highlighting trending searches, to mention just a few. Sending push notifications is a two-sided sword as it may have the opposite effect on some users and actually reduce user engagement. That is why it’s critical to test and assess your push campaigns and use historical data for campaign optimizations.
NTENT works with partners to maximize the impact of sending push notifications, extracting different types of data signals from our user behavior logs, including search query categories (beyond counting clicks), performing a large-scale analysis on them. We combine these signals and employ word embedding and deep learning techniques to predict the impact of push notifications on user engagement.
In more detail, NTENT proposes a set of input variables (features) to predict the potential success rate of sending a given push notification to a specific user. Since we aim at minimizing the cost of feature engineering, we will employ word-embedding features (i.e., deep learning based pre-trained representations of words as vectors of real numbers) when possible. However, using word-embedding features alone might not give the best result. Therefore, we manually also identify a number of non-textual behavioral features, which are based on users’ interactions and push notification attributes.
To lay the base for deep learning insights we categorize our proposed features into three main classes: (1) push features, (2) user context features, and (3) interaction features.
- Push features: coming from push attributes or metadata for each push campaign, e.g., push type, push time and push text. The types of pushes can be for example a breaking news event, sports news or a new app feature. This includes the word embedding of push text and related URL content.
- User context features: coming from the users’ inferred demographic information, search activities, and search history, e.g., a sequence of their previous activities, which can represent the interests they have displayed across time. We also extract the word embedding of users’ activity history.
- Interaction features: coming from previous interactions of the user with push notifications of different types, e.g., the number of clicks over different types of pushes.
This is merely a sketch of the features that we could be using to build the feature vector to predict interactions with the push notification.
In the image below, we illustrate the proposed features and our machine learning steps, including model training and prediction. Each feature extraction step consists of partly manual and automated processes, which will be performed by running various programs locally or on a Hadoop cluster.
To create a ground truth dataset, we need to filter users who have low activity rates. In order to compute activity rates, we employ different user-based metrics, such as the sum of behaviors of a given type, the number of distinct behaviors, the number of distinct queries, as well as computing the rates of the aforementioned activity metrics per day. In addition, we exclude some types of contents which are irrelevant to this task.
For training a prediction model, we need to partition our dataset by taking into account the time dimension. Intuitively, we divide the user behavior dataset into training and testing, where the training set consists of data records prior to a certain time point, and the testing set is composed of data records after this time period. Finally, we split the data into a training set that represents 85% of the original dataset and a testing set that represents 15%.
We employ a set of various prediction models using neural networks. Generally, our model learning pipeline takes a multidimensional input feature vector and produces a probability of success, for a given user and a given push notification. Our baseline prediction model is a logistic regression (LR). We improve upon this LR-based model, with a more advanced model with multiple layers, thus producing deep neural networks, which also include convolutional layers. Such Convolutional Neural Networks (CNNs) stack multiple layers to encode and compress features. CNNs can capture non-linear relationships, thus yielding better classification performance.
The developed prediction model can be applied in several different ways:
- To decide for specific users whether s/he should receive a given push notification.
- To select among a set of curated push notifications of different type and content, to send the one which has the highest likelihood to increase user retention.
- Design push campaigns which maximize the aggregated impact on user engagement over all the targeted user groups.
A possible refinement for this model could be to create different predictors for different push types as they are likely to have different interest distributions, so an ensemble of independent models that tackle each of them might achieve higher performance.