Markov Chains — only the present matters
If you order a t-shirt online today, what’s the probability that Google will show you ads of t-shirts tomorrow? Or if you have a Jalapeño pizza now, what’s the probability you’ll have ice-cream after? The similarity between these two questions is not only the word ‘probability’ but also the subtle change in the question passing from present to future.
A Markov chain predicts a sequence of possible states in which the probability of each state depends only on the present state. It is named after Andrei Markov who discovered the process when trying to find the patterns of words in a poem. Today, the basis of it is used in ranking websites like Google’s PageRank algorithm. It’s also used in the medical field to model spread and progression of infectious diseases. In this post, we’ll see how it can be used in the Environment Protection sector.
According to IUCN, at least 14 million tons of plastic end up in the ocean every year. Doing beach clean ups and identifying the type of plastic helps in waste analysis which gives potential solutions for a cleaner environment. Imagine, you’re at a beach for a clean up drive. If you find a plastic bottle cap, what’s the probability that you’ll see a plastic bottle next? Let’s find out.
The state of a Markov chain refers to a particular process at a specific time. Example, the weather today is sunny. In this case, ‘sunny’ would be a state. For the sake of simplicity, consider only two pieces of plastic to be found on the beach — a plastic bottle cap and a plastic bottle. These can form to be the two states in our problem.
Markov Property or Memoryless Property
The Markov Chain property states that only the most recent value, which in our case the most recently picked up plastic is what affects the next value or the next piece of plastic we pick up. Whatever we picked up previously or in the past doesn’t matter.
Chain Probability or a sequence
Now back to our original question of ‘If you find a plastic bottle cap, what’s the probability that you’ll see a plastic bottle next?’.
We denote this question mathematically as —
P represents the probability of finding a plastic bottle next given that you picked up a plastic bottle cap right now.
Let’s modify this question —
What if we want to know the probability of finding a plastic bottle, a plastic bottle cap and then again a plastic bottle?
This becomes a sequence and we denote this mathematically as —
Consider you’ve done clean ups at this beach previously. In total you’ve collected 10 pieces of plastic. Out of that 6 are plastic bottles and 4 are plastic bottle caps. In terms of probability, it means 0.6 and 0.4 respectively.
Every time you picked up a plastic (present) you noted down what came next(future). In this way you found a cap after a bottle 3 times, a bottle after a cap 2 times, a bottle after a bottle 4 times and a cap after a cap 1 time.
Representing it as a table—
We can now calculate the probabilities of each state transitioning from itself to the next state. This is what’s called conditional probability and in Markov Chains we call it Transitional Probabilities.
The probability of finding a plastic bottle next given that you’ve just found a plastic bottle cap is 75%. That’s because
Doing this for each state we get —
You can put the values in a matrix form like so —
To better visualize this we represent it as a directed graph:
We want to know P(plastic bottle, plastic bottle cap, plastic bottle). To find this we use the Markov property mentioned earlier and the transition matrix we just formulated.
The first plastic piece in sequence is a bottle. We have nothing prior to it and represent it as follows
Why 0.6? Because out of the 10 plastic, we found 6 bottles and the probability of finding just a bottle on this beach is 60% or 0.6.
Next in the sequence is a plastic bottle cap. Here, we have something prior to it — the plastic bottle. Therefore as discussed in the ‘Markov Property’ section we consider only the present (plastic bottle) and what we might find in the future (a bottle cap):
We do the same for the next part of the sequence:
Finally, we multiply these values to get the answer to our question.
The probability of finding the plastic pieces in this sequence would be 0.1485.
This sequence probability could help researchers propose a solution to the increasing amount of plastic garbage near coastal areas. In reality, we not only find one or two types of plastic on a beach but several types. Finding the probability of what comes next could answer several questions.
For example, is there a particular sequence to certain garbage on this beach? If after a plastic party cup, finding a plastic straw and then a plastic throwaway plate gives a high probability, it could mean several fast food vendors on the beach use plastic wrapping or maybe, people bring such food in plastic packaging along with them.
This could help researchers in root cause analysis and propose a solution to such problems. Besides the environment sector, you can explore Markov Chains in other domains like sales forecasting, consumer behaviour, game development and much more.