Predictive Modeling for Sequential Behavior: A New Theoretical Algorithm
After writing my last piece about search, match-making, and attribute cohorts, I was inspired to extend a piece of that concept (nested cohorts of values that build a “tree” based on the relationships between the date) into another related topic I’ve been thinking about recently: Predictive Behavior Modeling.
The last two pieces I’ve written have been lengthy and fairly taxing for me to pull together in my spare time — so I’m going to keep this one high-level for the time being. Later, as time allows, I’ll return to it and explore some extended applications, as well as take a deeper dive into the theoretical data model.
Anyway, let’s get to the good stuff:
One of the things we often try to do with machine learning is predict what “comes next” in a sequence, based on previously recorded and analyzed data. The thing that “comes next” could be an action (clicking a link, buying a product), a condition (qualifying for an offer, a traffic jam), or a value (a cluster of pixels in an image, a word in a message). The algorithms we use are often multi-purpose and somewhat data-agnostic — they don’t have to “care” what the human context of the prediction is, as long as the data they are fed, act on, and output reflects human considerations. That’s part of what allows machines to be “intelligent” (from our perspective, at least).
However, I’ve found that as I wrote my previous two pieces about machine learning, maintaining such an open scenario made it difficult to tell a clear story; everything gets blurred into abstractions.
So, for the purposes of this exploration, I’m going to get specific: let’s talk about Predictive Behavior Modeling in the context of online shopping. (But as you read, keep in mind that this model could be shaped to nearly any context, with a little creativity; I’ll try to keep my fingers from wandering off in any of those directions as I type. No promises, though.)
Consider the major actions that a user takes when they’re on an ecommerce website: loading a page, clicking to another page, adding a product to their cart, filling out a form, entering their payment details, completing a purchase, etc. These are the actions that you, the business user, would identify as significant. Let’s call these actions “Nodes.”
Now, consider all the minor interactions and events that roll up into that major action: scrolling the page, pausing to read, leaving the focus of the browser tab/window, expanding an information panel, flipping through an image carousel, etc. These are the events that occur between when the user enters one node and when they depart it for the next. Or, putting it differently, they’re what the user does when they’re inside a node. Let’s call these actions “Pips.”
Now, consider the user’s journey while on a website: passing from the homepage, clicking to log into their account, navigating to a product category, clicking to a product detail page, adding the product to their cart, etc. Each step along that journey represents a Node. When you record the nodes that a user has passed through in sequence, that sequence becomes a record of their visit to the website. Let’s call that sequence a “Chain.”
There are generally specific actions that we want users to perform: signing up for an email program, completing a purchase, creating an account, etc. Like other actions, these can be considered Nodes; but they often represent the end of a journey (or at least the end of a clearly defined portion of that journey). Let’s call this special type of Node an “Endpoint.”
Now that we’ve sketched the basic model for describing a user’s experience (again, we’re discussing this in the context of a website visit, but the experience could be anything — and for that matter, so could the user), let’s consider how we might build a data set consisting of many users’ visits to a website.
To start, we’re going to want to define the major actions (the Nodes) for the machine. (Eventually, we might decide to make the algorithm smart enough to deduce this on its own through discovery and analysis of a large enough data set — but for now, let’s assume we’re teaching it ourselves). While we’re at it, let’s define the Endpoints as well. These will likely be unique to our specific scenario or context.
When the user visits the website, we’ll record all of their actions: from Pips to Nodes, Nodes to Chains, and finally to (or not to) an Endpoint. We’ll do this for every user who visits the site, capturing their path from action, to action, to outcome. When we have enough data that we’ve achieved statistical significance, we can begin analyzing what we’ve captured.
It’s almost certain that different users, in their different journeys, will pass through similar (or identical) sequences of Nodes. That means that their chains will have a certain number of “links” in common. Because those Chains represent similar paths, it makes sense for us to group them in some way. Let’s call a group of Chains a “Chain Cohort.”
When a Chain Cohort only shares a single starting Node (for example, landing on the homepage), the Chains within it don’t have all that much in common — it’s difficult to predict what the user will do next. The more Nodes shared by a Chain Cohort, the more specific the path taken by the user.
To account for this, we can group Chain Cohorts within other Chain Cohorts, “nesting” the cohorts according to their specificity and the number of leading Nodes they have in common. For example, consider the following:
Chains beginning with a homepage visit Node:
- Homepage visit Node, then Login Screen Node
- Homepage visit Node, then Shopping Cart Node
- Homepage visit Node, then Product Category Node
Each time we go “down a level”, we encounter a greater number of Chain Cohorts, and each cohort contains fewer individual Chains (i.e. the groupings become smaller as we continue to subdivide). The sequences of nodes in each chain become more specific and unique to the user, and the usefulness of the information we’re capturing increases.
Because we have defined our Endpoints, we know whether or not each Chain captured resulted in a desired outcome. To make this knowledge useful to us, we can attach a “weight” to the Endpoint: a value that indicates how highly we regard that particular outcome (for example, checking out with a purchase might be weighted more heavily than signing up for an email program). The presence of a weighted Endpoint in a Chain indicates that the Chain was “successful.” The absence of a weighted Endpoint indicates that it was not.
We can then average the weights of all Chains in a Chain Cohort to determine how successful the cohort was in aggregate. Moving up a level, we can further average the values of Chain Cohorts that are part of a more general cohort to determine its aggregate success, and so on.
The closer to a single Chain with a particular Endpoint that we get, the more strongly its weighting is considered. Conversely, the further from a specific Chain and Endpoint that we get, the weaker the consideration of its weight. This reflects the odds that a user currently in a specific position of their journey will arrive at the Endpoint, given their “proximity” to it. So, we can say that the weighting of an Endpoint, transferred to its Chain, then to its Chain’s Cohorts, reflects the likelihood that a user will experience a successful outcome given their current trajectory.
To better represent the relationship between Nodes (where they occur in sequence, variations on them, etc.), it will be valuable to capture a bit more information on them as we are recording each user’s journey. We can group similar Nodes into Node Cohorts, to allow us to pull back and better understand the variety inherent in each Node’s actions, the various positions a Node can occupy in Chains, and their relative participation in producing successful outcomes.
Once we have recorded a sufficient number of user interactions, we can begin evaluating new interactions based on the model we’ve just built — and attempting to influence new users to move towards our preferred outcomes.
Here’s a quick look at how that could work:
- When a user first visits a website, they arrive at an initial Node. This begins their Chain.
- Based on that starting Node, we can identify all of the probable routes that the user’s journey is likely to take. These are the Cohorts that the Chain we’re building belongs to.
- As the user passes through additional Nodes, their Chain grows. Each time we add a Node to the Chain, the user “descends” deeper into increasingly specific Chain Cohorts.
- The longer a user’s Chain grows, the closer the user draws to a theoretical Endpoint. This means we begin to detect Chain Cohorts that we would like the user to belong to (and by extension, Nodes we would like them to pass through).
- We can then begin to take steps to shape the users experience in ways that might direct them towards the Nodes that are most likely to lead to a successful Endpoint.
There’s a lot of complex behavior wrapped up in that final bullet — too much to get into in this introductory piece. One thing we can discuss here, however, is the role Pips can play in promoting an outcome.
Pips represent the “flavor” of a Node — the unique characteristics that differentiate it from other similar nodes (for example, a general “add to cart” node might include Pips that carry detail on which product was added, quantity, what the user did on the page, etc.).
When the user is at a particular Node in their journey, we can pull back to examine the trends revealed by its parent Node Cohort:
Which Nodes in the cohort eventually led to an Endpoint? (i.e., the Nodes belonging to Nets / Net Cohorts with high success weighting)
- These help us select the next Node(s) that the user “should” hit one of, if they are to continue on a path to a successful outcome.
In the Nodes that lead to Chains with the highest success weighting, which Pips occur most frequently?
- These represent the micro-actions and other interaction details that most commonly accompanied movement towards an Endpoint or successful outcome.
Then, we can take action to influence user behavior based on our goals:
What “levers” do we have to pull in order to influence the user to behave in a way that aligns with these Pips — raising the likelihood that the user will progress to the preferred next Node in their journey?
The “lever” depends on the nature of the Pip — for example, consider the following potential Pips and the behavior we might initiate to encourage interaction:
- Buying more than one unit of a product > pre-populate the quantity box, or highlight the increment button in some way
- Scrolling down the page > Initiate a slight “accidental” scroll to expose lower content, or highlight a downward arrow element in some way
- Clicking on an image to zoom in > highlight the UI element
- Selecting a specific search result > elevating the item on the SERP, highlighting it with a special treatment
- Engaging with onsite chat > popping the chat box automatically
Bonus: If we detect that certain Pips are rarely or never present in successful Nodes, we can take steps to suppress or minimize them on the page — effectively narrowing the user’s focus to the engagement options that are statistically most likely to produce a successful outcome.
Since weighting is based on a variety of nodes, chains, and cohorts, there will likely be more than one Pip vying for the user’s attention at a given time. That means the prominence (or “encouragement”) of each element will be determined by its relative weight.
Because the outcome of a chain is automatically captured based on the presence or absence of an endpoint, we are constantly refreshing the selection of recommended Pips as we adjust the weights of each cohort.
I think this is a good stopping point for this piece, for the time being. But first, let’s encapsulate everything we’ve covered (brevity is not my strongest suit, so this gives me a chance to condense):
- Nodes are actions the user takes.
- Pips are details inside those actions that describe the unique “flavor”. In a cohort of Nodes, the Pips are what differentiate between them.
- Chains are a sequence of Nodes.
- Endpoints are Nodes representing positive outcomes.
- Chains are grouped into Cohorts based on the number of nodes they share, beginning at the head. These are nested. The structure is a tree.
- Nodes are grouped into Cohorts separately to compare them to peers.
- Endpoints carry “weight”. This weight is shared to the Chain, then the Cohort, and so on.
- Weight described likelihood of success. It decreases with each step away from the Endpoint.
- The combination of all the above allows us to identify when a user is approaching a likely outcome and begin to “pull” them towards it — the closer the outcome, the stronger the pull.
- The entire system “educates” itself by feeding past successes and failures into the data model, shifting the weights of chains and cohorts based on performance.
Given this type of predictive model, what sorts of behavior could we model? When I really think about it, ecommerce, although particularly close to home for me, is honestly one of the least interesting scenarios.
Imagine applying the Chain/Node model to engine performance, movement, and fuel efficiency data streaming through a vehicle as it navigates a variety of environments. How fast is the car going? How open is the throttle? Are the brakes actuated? Is there skid? Do we anticipate a demand for more speed? Are we turning? How smoothly is the driver commanding the vehicle? Are they signalling their turns? Is there trouble? Is there a challenge to be solved — a puzzle to be weighed and balanced and adjusted?
How can we use this model to make things better?
…and don’t even get me started about the idea of sub-chain cohorts and node-jumping. Let’s save that for another day.