simulating interactions on a bot state transition network

Bot Analytics: Simulating Random Walks to Predict Network Interventions

Building a Network from Conversations

Published in

i ❤ data

9 min readOct 3, 2016

Over the summer, I worked with a good friend Gilad Lotan as a data science intern at betaworks. It was a good opportunity to see what people are building these days, and see if there would be any room for practical utilizations of network science. Ultimately, I started to work with another former colleague, Greg Leuch on Poncho, a chat bot on Facebook’s emerging messenger bot market. Being a network scientist, I’m excited by opportunities to apply my methods on data that can be projected as a network with the goal of solving a specific problem. This past summer I had the opportunity to do just that on Poncho’s bot.

Here’s what an interaction looks like with the Poncho bot:

How can we look at all conversations with the Poncho bot as a network? Consider the conversation above as a chain of interactions. Under the hood, Poncho has a sense for what state it is in, and effectively code that figures out how to respond to every piece of input that it receives. For example, saying “Hi”, “Hey!”, or “Hey there” triggers the “hello” context and a series of potential responses. If we strip the content away from the conversation, we can look at the interaction as a series of state transitions:

In this picture, we are moving through time, from left to right, and as we move through time, we move through a series of interactions. Each interaction elicits a response from the bot, which again, matches one of a large set of rules within the bot. We can then think about this interaction a little closer to network terms:

If we put aside the actual content of the conversation, and identify the transitions happening across states and rules, we effectively have surfaced an underlying network, which represents a single user’s journey with the Poncho:

If we do the same across all conversations, we can build a massive graph that helps us understand how users are navigating Poncho — what are the key “regions” covered by the service, where are people “hanging out”, and where they are dropping off.

Network graph representing user state transitions across all Poncho states

In the image above, the red section represents conversations with Poncho about the weather, its primary topic. Nodes (or states) are sized by the number of times people hit that state — edges are directional, weighted (by the number of times a person moves from one state to the next). The yellow and green regions represent other, non weather related, topic spaces frequently visited by users.

Random Walks

In network science, there’s a useful tool called a random walk. It is a simulation of traffic on a network, where tiny “agents” move from node to node according to some probabilities set in the network. Borrowing from a passage on the Wikipedia article,

“…one can imagine a person walking randomly around a city. The city is effectively infinite and arranged in a square grid of sidewalks. At every intersection, the person randomly chooses one of the four possible routes (including the one originally traveled from)… Will the person ever get back to the original starting point of the walk?”

Here’s a very simple graph:

Here we have two nodes that move once a day — on the left node, we have 10,000 “walkers” (or “agents” or “people”). For every walker on this node on a particular day, there’s a 60% chance they will move to the right node, and a 40% chance that they will stay put. On the right node, we have 10,000 walkers, and there’s a 50% chance of moving to the left node, and a 50% chance of staying on the right node. After 102 days of moving, where will more walkers end up? Since people leave the left node more often than they leave the right node, we actually observe that more people end up being at the right node after 102 days. For the 20,000 total workers moving for 102 days, the left node experienced 1,391,131 total transits, and the right node experienced 1,668,869 total transits. Let’s take a look at a slightly more complicated network:

For this network (30,000 people now, 10,000 people for each day, where the number in the node just indicates the ID of the node), we observe 1,444,562 transits for Node 0, 807,689 transits for Node 1, and 807,749 transits for Node 2.

Risk

So far, these random walks are very merciful to the walkers. But we need to, unfortunately, start killing some of them. Going back to the context of what we’re working on — we have a network where each node represents a state that people hit when interacting with Poncho, and every outbound edge represents the next states people will hit in their chain of interactions. At each interaction, though, some people stop talking to the bot — they “bounce” from the conversation, and we have lost them for now. So, in order to make our random walk more realistic, we will change our stopping condition, and add in this element where people stop talking to the bot — at every state, we take (Number of people who left after arriving at this state)/(Number of people who hit this state generally) as the “bounce rate” or risk of the node. Every “day”, or time step, we will “lose” some of these nodes at a proportion equivalent to whatever the risk rate is for the nodes that they’re currently on. The simulation will end when everyone has ended their session. Additionally, people will start according to where most people start their conversations — most people will start with something like “Hello again!” rather than start in the middle of a deep conversation, so we will place them in the starting nodes proportional to the number of conversations that start on any particular state.

Linked here is a random walk simulator I’ve written up in javascript which demonstrates 50 node networks, randomly stitched together, with random numbers of walkers starting on each node and random risks on each node. Go ahead and run it a few times to get a qualitative sense for how this process works.

As you can see, we’re also storing statistics about how many times people transit through the states, the number of steps along the way, and the number of surviving walkers. We could also measure the number of transits through states as we did in the very simple cases we talked about before.

Applying the Simulation

So what do we gain with this? Consider this extension to the model: By construction, if you run the same simulation with the same numbers many times, many of the statistics we would likely be interested in (e.g. total number of transits, and total number of transits for each state) will result in normally distributed results — you can try that out for models in that javascript demonstration above. So, we only need to run a few dozen simulations to get a sense for what the average number of transits would be for a given network with it’s various risks, starting points, and traffic volumes for each node (which again, represents a state in Poncho’s conversation “brain”). What’s interesting about this is the following: This allows us to play out counter-factuals.

For example, for some particular state, what if people, on average, were more likely to stay and continue on to another state (that is to say, what if the risk were lower)? Or, what if a few more people transited through the rule on average than we’re currently seeing? In fact, we can construct an infinite number of counter-factuals, but as will be obvious, just these two are enough to be dangerous.

For Poncho, we can observe the full network of all interactions, calculate all of the initial starting counts, risks, and amount of general traffic for all states in the network, along with the weights for each of the edges and self loops (e.g. the 0.6 and 0.4 for the left node in the first diagram we went over). Then, we can walk through every state, and change the risk to some number that is a bit smaller, or change the traffic to something a bit higher. Then, we can look at the total number of interactions across the system, a number we already know, as well as the numbers for some states we may want to pay particularly close attention to (e.g. perhaps Poncho wants to see if people are using a new feature (meaning a point in conversation, ultimately), and want to try to get more people to use that feature). For each state, we can run a series of random walks, find the normal distribution of expected outcomes, and calculate whether or not changing the risk would result in a statistically significant difference — more engagement in the service.

In pseudo-code, it may look something like this:

def pseudo_crazy_big_analysis(network_id, num_trials=10):
      for rule in bot_rules: #for each rule the bot uses
        simulation_settings = [{changes: None, rule_being_tested: rule, network_id: network_id}] #set up a simulation where 
        # nothing is different
        for trial in range(num_trials): #set up num_trials number of simulations where only this particular rule is different in some way
          simulation_settings.append({changes: 0.05, rule_being_tested: rule, network_id: network_id}) #add that to the list of 
          # simulations to run and compare
        results = urllib.request.Request('/get_analysis', simulation_settings) #get the results from that 
        # comparison, which in turn will use num_trials+1 number of lambda functions to simulate the walks in parallel.
      return results

Then, we send these results to an internal dashboard for Poncho, showing them which states should be tweaked in order to drive more engagement with the service:

Internal dashboard highlighting bot optimization recommendations for Poncho

Here the system produced 36 interventions that are statistically significant, and have together surfaced ways in which we could increase interactions across the platform by 45.49%. In other words, if we were able to fulfill the fixes that have been provided, we would be able to very quickly lower our bounce rates, and increase engagement. By using Amazon Lambda to run the simulations, we can conduct all of this work in about 30 seconds (more on that in a different post!). For the above page, for example, we conducted about 15,000 Lambda tasks — or the equivalent of asking 15,000 computers to turn on, do 5 seconds of work, then turn off again.

Where this is going

Over the summer we’ve built a bot analytics service that models user journeys as state transitions, with a focus on actionable recommendations for optimization — how to drive more engagement. We’ve put together a number of internal services and dashboards that help Poncho’s product leads and editors identify opportunities for changes that can help drive users towards wanted goals (i.e. longer sessions, sign up to alerts, etc…).

Many of the recommendations being surfaced make clear qualitative sense, and this mechanism can act to automatically determine the minimal space where interventions would actually be efficacious in increasing growth and conversions. Beyond this specific application, there are many ways to adapt the framework to a wide range of applications — we can simulate an infinite number of strategies for our users, and see their likely effects before we run the campaigns.

For example, we could consider a counter-factual design intervention where we give users some extra feature for referring new people, and see how many referrals we would actually need, and from which types of users, in order to smartly roll out growth campaigns. Alternatively, we could explore the effect of retention (or the likelihood to start up a new session) against various types of users (where we compare people that are very unlikely to return versus people who are very likely to return), and which types of users typically trigger conversions. In general, the network analogy works, and modeling users as transiting through that network through the well-explored random walk methodology could make bot makers much more successful.

We’re also exploring a number of use cases for live prediction, and ways for our system to help make personalized re-routing decisions in realtime. More on this as we progress!

Thoughts, questions or feedback? Find us here — @dgaff | @ponchoirl