**Deep learning to intervene where it counts**

*How we built a feedback loop to optimize learning nudges*

Learning isn’t easy. To make it a little easier, we launched In-Course Help, delivering behavioral and pedagogical nudges as learners move through course material. In this post, we cover our process and learnings in implementing a machine learning feedback loop for personalizing and optimizing these nudges.

In the first implementation of In-Course Help, all learners at a given point in a given course — for example, completing Lecture 9 of Course A, or failing Quiz 3 of Course B — received the same message. This allowed us to intervene in ways that were helpful *on average*, and moved the needle on course progress and retention.

But we also observed heterogeneity of impact across learners and messages. Correspondingly, in a world where all learners at a given point in a given course received the message, we were wary about rolling out too many messages.

For the next implementation, we created a smart feedback loop to control which learners received each message. The model is a neural network that takes as input a wide range of features, including the following:

- The learner’s past click-through rates for various messages
- Her demographics (e.g., gender, age, country, employment level, education level)
- Her on-platform behavioral data (e.g., whether the enrollment is paid, browser language, number of completed courses)
- Course-level characteristics (e.g., domain, difficulty, rating)

Using these features, the model predicts how likely a specific learner is to find a specific type of pop-up message helpful at a particular point in her learning. If it predicts that the message will have a sufficiently positive impact, it triggers the message; otherwise it holds the message back. The weights of the model and its predictions update nightly while our data science team sleeps — a big improvement from the baseline of complex and long-running nested A/B tests, with the team making manual adjustments to the interventions based on observed results. The feedback loop system also naturally extends to allow us to choose among multiple versions of a message that can be sent at the same point to the same learner, triggering only the version predicted to have the most positive outcome for the learner.

Today we have two levels of filtering: a course-item-state level filtering to decide which messages to keep around because they are sufficiently helpful, and a user-course-item-state level filtering to personalize which messages go to which learners at any given learning moment.

In brief, for each possible nudge on every item state in every course, the course-item-state level model predicts the *average probability of a learner finding the message helpful* based on past interactions with the message and course-level data. Intuitively, if the model predicts that the message is not sufficiently helpful, we hold back that message at that trigger point altogether (provided the number of impressions is sufficiently large). This trigger-level filtering is especially useful as we expand our message inventory because it automatically detects and filters out messages that are not helpful — or are not for a particular class or trigger point.

The course-item-state level model is layered under a similar feedback loop that filters on the user-course-item-state level. Take a simple example: We want to know whether to send a particular message to Alan at a particular point in his learning journey. For exposition, consider a message for which we are directly collecting self-reported helpfulness from the learner. In the current implementation, there are three possibilities.

- Alan could be randomly chosen (today with probability 10%) to receive the message
*no matter what*; this ensures that we have enough unbiased data for the model to continue learning — and improving — nightly. - Alan could be randomly chosen (today with probability 90%) to potentially receive the message,
*but*Alan is a new learner and has barely interacted with our messages. Since we don’t have sufficient data on him to make a reliable prediction, we send him the message to collect data. - Alan could be randomly chosen (with same probability 90%) to potentially receive the message,
*and*has interacted with enough ALICE messages for the model to make a reliable prediction. Then, based on data from Alan’s learner profile and his previous interactions with In-Course Help messages, the model outputs three probabilities: a) the probability that Alan clicks, “Yes, this was helpful”; b) the probability that Alan clicks, “No, this was not helpful”; c) the probability that Alan*doesn’t interact*with the message.

We send the message if and only if a) sufficiently exceeds b) and c). Today, the feedback loop holds back about 30% of the messages and increases the ratio of helpful to non-helpful reports by 43%.

So what’s next?

First, we’re iterating on the optimization function. The example above considers optimizing for a positive uptake on the call to action (either reporting the message was helpful or clicking through on the recommendation). For some nudges, however, the optimization function can and should be further downstream. For example, if we invite the learner to review important material, her clicking through the link provided does not give us sufficient information on whether that review material actually helped her learning outcomes — only on whether she followed our recommendation. For these types of interventions, we’re extending the optimization function to incorporate downstream learning outcomes such as items completed.

Second, with this fail-safe built in, we are brainstorming and launching new kinds of interventions. Since the model automatically chooses which nudges to keep running where and for whom, we can explore new ways to engage learners, confident that those that are not helpful will be efficiently held back.

*Interested in applying data science to education? **Coursera is hiring**!*