Markovian Recommendations | Markov Chains

Sergei
Pipedrive R&D Blog
Published in
5 min readNov 14, 2023
[Image on www.analyticsvidhya.com]

In the last article, we looked at a way to choose the best category for a recommendation from the Naive Bayes model. For example, in Pipedrive’s CRM system, Naive Bayes can be used to predict if a salesperson’s chance of closing a deal increases or decreases — information that can be used to recommend the next best action. Still, in general, it is naive (which is typical for Naive Bayes) because it only takes into account the quantitative value of a category among forecasted classes.

However, when we consider a more complex approach for the next best action recommendation, the reality is much more complex, as the environment and, importantly, time and events of the past, also significantly influence what the best next step should be.

Naive Bayes simplifies the process by not taking dependencies between features into account, which has a good effect on its performance. Nevertheless, we’re still interested in the dependence between states, which is better described and modeled using Markov chains. In this regard, it is worth considering the option of a recommendation based on a Markov chain because it is based on previous events, making it closer to the behavior logic of choosing an action. For example, a salesperson who reaches out to a customer with a follow-up email is more likely to close the deal than a salesperson who does not follow up.

However, it is also based on statistical analysis.

A Markov chain is a common and fairly simple way to model random events. It is used in a variety of areas, from text generation to financial modeling. We’ll omit symbolic mathematics and theoretical calculations, which are often intimidating and will speak in plain English.

Let’s assume that we find ourselves in state B, from which we can perform various actions to move to state C or D.

If we have no more input data then, of course, the decision looks like 50/50 because we don’t know anything else and don’t care which state we’ll end up in. However, if we have knowledge about the probability of one of the possible events occurring — for example, based on a count of events that have already occurred — we can use this knowledge to identify the most popular option.

This process, where the next state only depends on the current one, is called Markovian (named after creator Andrey Markov, a Russian mathematician best known for his work on stochastic processes and theorems, including the Markov process). Here’s an example where we suggest the next action taken after an email is sent.

However, in reality, we are more often faced with a situation where the next state depends not only on the current condition but also on the previous one (or previous ones).

Such a process, called non-Markovian, can also be used to select the next action.

Question: How did we get to state B? If, for example, from A, we can also get to B1. The answer can be found in the state space previous to A: state A0.

However, in the case when A0 is the initial state, how do we choose between A and A1?

Here, we can use a random generator, taking into account what was more often A-A0 or A-A1.

What if we have several initial states? We can still shorten it further to infer which initial state is more common. This is especially relevant if you use a Markov chain to generate sentences and select the first word.

However, the choice can be omitted if we only have one initial state. For example, when we recommend an action for deals. Initially, there is only one starting point — creating a deal, after which multiple branches begin with various kinds of actions: write a letter, make a call, add a note, anything that the CRM system allows us to do. At the same time, we know that the final results are also determined: the deal is won, lost or deleted.

We can represent all possible states from the creation to the closure of a deal as a multivariate weighted graph based on existing knowledge about many millions of deals created in CRM [1], each of which had its path from start to finish. Using this vast knowledge network, recommendations can be made based on the behavioral model built.

In addition, non-Markovian process is not technically conditionally dependent on only one previous state; it is possible to take into account two or three previous states or even everything from the very beginning. The only question is the cost of the calculations and the need for such accurate measurements. In practice, combining the Markovian modeling process with a non-Markovian process with one previous state dependency is the approach for a number of problems. Additionally, neural networks can find hidden dependencies between non-adjacent states.

Thus, by combining all of the above, including recommendations from the Bayesian model, we can get a comprehensive method of recommending next best action, including internal automated selection and several proposed by different algorithms.

NB! An attentive reader may note that transitions can be not only direct, but also reverse or even closed on themselves, but this does not affect the essence of the algorithm.

As I mentioned above, the Markov chain is also quite suitable as an assistant for text completion, including various instant messengers on phones. Sometimes, this assistant even suggests options successfully, especially in simple sentences. The disadvantage is that this algorithm still doesn’t understand the sense, therefore, it can’t make recommendations based on the meaning of the sentence being compiled, making it unsuitable as a severe text generator compared to modern Large Language Models (LLMs). Still, it is quite capable of generating a more or less logical unique lorem ipsum, especially as the connections between words are valid since they’re based on existing texts.

We’ll talk about how you can try to add some “intelligence” to text generation without involving LLM next time, when we touch on the topics of Natural Language Processing (NLP), imagination, sense, logical decisions and contextual relatedness.

[1] Pipedrive respects the privacy of user data. Pipedrive’s AI models are trained individually for companies and do not interact with each other.

--

--

Sergei
Pipedrive R&D Blog

Software Engineer. Senior Backend Developer at Pipedrive. PhD in Engineering. My interests are IT, High-Tech, coding, debugging, sport, active lifestyle.