Act Like Markov

Phelix Juma
8 min readNov 28, 2021

“To be a true Bayesian is to allow yourself to grow in mind and ideologies based on new evidence and to not allow yourself to be tied down to your ideologies of old.” I concluded as I shared my views on how to Think Like Bayes. In that piece, I concentrated more on the need for us to always update our beliefs based on new evidence but today my thoughts take me a step further: how exactly should we act having refined our thinking processes? As I pondered on this question, the Markovs came to mind. You see, the Markovs were more or less like the Bernoullis; both families were to Mathematics what the Curies were to Physics and Andrey Markov’s work on the Markov Processes came to mind.

As a founder, this is something I’ve found to be quite useful in growing a startup from stage A to stage B. It’s both an art and a science; as a science, it requires us to experiment and measure. Whatever we suggest to do, we set it up as a hypothesis, conduct an experiment, measure the outcome and compute its incremental value. If it has a positive incremental value ie it increases our ROI or KPI better than the alternatives, then we set it up as an optimal action to take when we desire to move from point A to point B. The experiments are incremental and build on each other and one should always ensure that they follow the data wherever it leads. This is the concept of an Advanced Bayesian Thinker — the one who follows the “light” just like the 3 wisemen following the stars to find baby Jesus. In its basic form, this is in itself a way to “act” but how do we transition into acting like Markov?

Markov Process

Markov, through Markov Processes, tells us that we can look at our lives as a set of states and then goes ahead to tell us how we could move from one state to another. For instance, when a child is currently playing, what should we expect they will do next? From the above picture, there’s a 10% chance the child could continue playing, a 50% chance they could start eating and a 40% chance they could start crying. But what if the child was previously playing and now eating? What could they do next? Is it any different from if they were first crying then currently eating? Markov tells us that only the current state, not historical states, determines what the child could do next and hence, what we care about is that the child is currently eating and it doesn't matter what they were doing before that when it comes to telling what they could do next.

The next state only depends on the current state, not historical states.

Markov processes assume a fully stochastic process but our real lives aren’t fully stochastic; they are partly random and partly controllable. Let’s look at an example: a startup that currently has 1000 WAU (Weekly Active Users), its past 3 week WAUs as 1200, 800 and 900 respectively and seeks to attain 1500 WAU in the next week. The startup can partly control this process by doing a set of actions that can increase this KPI but it’s also partly random because they don’t fully control the lives of their users which are affected by external factors outside their system. Turns out, this is described by the Markov Decision Process — which is what will guide us on how to act like Markov.

Markov Decision Process defines a number of things that I would briefly mention as I intend to keep this non-technical:

  1. Agent which in this case is the startup founder or the child in the previous example.
  2. Environment which is the space the agent operates in eg the market for the startup. The founder has no control on the market behaviour.
  3. State defines the current position of the startup ie it currently has 1000 WAUs
  4. Action defines the choice the founder makes while at the current state eg acquiring more users so as to increase WAUs
  5. Policy defines the reason behind making the decision or taking the action and not any other.
  6. Reward defines what we gain when we move from one state to the next.

A Markov Actor thus realizes that he has no control over the environment but can make decisions or take actions that can make them move from their current state to the next state with a given reward. The Markov decision maker knows that their next state depends only on on their current state; historical outcomes do not matter. Thus for the founder to grow to 1500 WAUs, what matters is their current state of 1000 WAUs and not the previous week’s value of 1200. This is one of the most important traits of the Markov Actor because he realizes that he cannot dwell on historical successes. Yahoo was once one of the largest internet companies in the world but they no longer are. If they are to grow to the next level, their past successes of the 90s do not matter; only their current state. Delle Alli of Tottenham fooball club cannot ride on his past great form if he is to move to improve; only his current form and status matter.

You are only as good as your last success or last state irrespective of the past

How does the Markov Actor make the best choice? Sometimes there is no knowing with utmost surity what the best choice is because the Markov actor must focus on the return and not the reward alone. The return is the long term total reward from all actions over time. A Markov Decision Maker is thus sometimes forced to take an action with a low reward right now because it will lead to a greator return in the end — this is what separates the genius Markov decision maker from the amarteur. An amarteur soccer coach uses his best players week after week against small and big clubs alike because he is focused on the current reward but what happens when they get an injury? What happens when fatigue kicks in? Thus a genius coach knows when to use team A and when to use team B in turns to maximise total points gained at the end of the season.

To know when to settle for less right now, even though they can get the best, in anticipation of a greater return in the long run, this is the genius of a Markov Decision Maker.

What Markov tells us is that the long term return is more important than one time reward. For instance, a startup with a lot of funds might decide that the best use of their funds now is to go full blown on ads and acquire millions of users instead of hiring engineers to stabilize their product first because their success is measured by how many users they have but very soon this proves to be a massive mistake. Even though it has a great reward in the current state transition, the startup soon realizes that it now faces lots of customer issues, complaints, churn and bad reviews that destroys their brand and makes it hard for them to continue operating in the market. The startup that balances customer growth with their ability to serve the increasing numbers is thus more likely to get higher returns in the long term even through they chose lower rewards. This is a concept I learned while working on a two-wheeled robot in college where even though the performance was measured based on how fast the robot could move from point A to point B, the biggest challenge was balancing speed with stability. Accelerate too fast and the robot falls; move too slowly and the robot won’t make it in time.

A two-wheel robot: finding the balance between short term gains and long term gains.

Like our 2-wheeled robot, a startup must thus find the perfect balance between speed and stability for a greater long term return.

That’s where the Markov policy becomes very important; it defines the set of actions given states. In short, what are the set of actions that exist to us at this specific state? How do we choose which of them to pick? As a new startup working on an MVP, should we use Google ads, friends or physical salespeople to get our product out there? As a growth startup looking to increase our Weekly Active Users, should we focus on increasing number of users or focus more on user retention and user experience by improving our product? Those are the set of actions we could pick given our current state. The decision is both an art and a science and as we said in the beginning, the Advanced Bayesian Thinker will resolve to data as a policy guide. He could look at historical data on how each action performed in the past for self or others while at that state or he could resort to experimentation and measurement and following the data wherever it leads.

The Markov Decision Maker thus cannot exist without thinking like Bayes. He has to compare the set of actions with their rewards and choose the action that indicates the best long term return. This is the art of Markov Decision Policy.

Long Term Rewards in Dating.

The Markov Actor exists not only in business but in every aspect in life. Your dating life is a Markov Process and your aim has to be to focus on the long term return and not short term gains. A soccer or NBA coach has to act like Markov throughout the season to maximize their points at the end of the season. A student or academic professor must act like Markov in their entre life in academia. An investor must act like Markov; yes Masayoshi Son found Jack Ma but that doesn’t matter now as he aims to find the next Alibaba. If there is anything you can learn from this long text, then it is that: your historical success matter not, only the last. In a world where short term benefits can blind us, when aiming for growth to the next stage of life, you must always focus on the overall return or final state not just the small successes in between states.

Markov Actor as a Chess grandmaster: In Chess, the next move for a piece depends only on where the piece currently is, not its past locations.

Like a Chess Grandmaster, the Markov Actor is a strategist. He knows beforehand his game plan but adjusts it accordingly based on how the opponent plays. He knows whether it’s The Queen’s Gambit or Sicilian Defense he’ll be playing. He knows when to attack and when to defend. He worries not about the past because he knows that his movements are restricted based on where each piece currently is, not the past. He understands the importance of sacrificing his Queen now even though it is low reward action if it accelerates his checkmating the opponent — the long term return.

--

--