A Short Example to Build Intuition about POMDPs

Aishwarya Pothula
2 min readJun 18, 2020

--

POMDP — Partially Observable Markov Decision Process

A POMDP is really just an MDP; we have a set of states, a set of actions, transitions and immediate rewards. The actions’ effects on the state in a POMDP is exactly the same as in an MDP. The only difference is in whether or not we can observe the current state of the process. In a POMDP we add a set of observations to the model. So instead of directly observing the current state, the state gives us an observation which provides a hint about what state it is in¹.

Imagine yourself as a tiny person (agent) amid a vast IT corporate workspace on your first day of work. You are trying to get to your boss’ cabin (goal state) before your appointment time (goal). Reaching the cabin on time will get you on the boss’ good side (reward). Here is the problem, the workspace is composed of multiple grids of cubicles (state space), and there are many paths you can take to reach the cabin.

When you start exploring, you move into different cubicles. You are able to observe various things within cubicles such as desks, workstations, chairs, etc., and maybe a washroom or a coffee machine near a few cubicles but not the cubicle number. Based on these observations (partially observable data), you may have an idea about where you might be (belief), but you don’t exactly know the cubicle number(state) in the workspace (POMDP). Now, to get help in navigating the workspace, you call up a friend who works in that office. She tells you,” if you are at cubicle number x (current state), take a right and walk straight to reach the boss’ cabin”(policy). However, this suggestion (policy) is of no use to you as you do not know the cubicle number (state) you are in (complexity in solving POMDPs).

Had it been a scenario in which the cubicles are numbered, and you are able to observe the cubicle number (state) along with other observations directly (fully observable MDP), you would be easily able to navigate the workspace with the help of your friend’s suggestions (policy) based on your current position.

For a more formal introduction to POMDPs, you can refer to this chapter.

--

--