Bellman Equation (1): Understanding the Recursive Nature of the Bellman Equation in Mathematics

Hossam Hamdy
6 min readJul 4, 2023

--

Richard Ernest Bellman (August 26, 1920 — March 19, 1984)

Have you ever wondered how we can solve complex problems by breaking them down into simpler subproblems? This idea of recursion, where a problem depends on solutions to smaller instances of the same problem, is a profoundly powerful concept in mathematics. Let me walk you through an elegant example of recursion in the form of the Bellman equation.

The Bellman equation is a recursive way to determine the optimal path through a sequence of decisions. It builds up solutions for small subproblems into an overall optimal solution for the full problem. To understand this intuitively, imagine you want to find the shortest path from your home to the grocery store. You would first determine the shortest path from your home to the intersection down the street. Then from that intersection to the next one. And so on, until you reach the grocery store. The full solution is built up from optimal solutions to smaller subproblems.

In mathematics, we can represent this type of recursive problem-solving using equations. The Bellman equation is a great example. It determines the optimal path to maximize reward through a sequence of stages. At each stage, you have to make a choice that will determine the reward you get immediately as well as the choices available to you at the next stage. The Bellman equation finds the optimal path through this sequence of stages by building up solutions for subproblems.

Here is the equation in mathematical terms:

Where f(n) represents the maximum total reward from stage n to the final stage, r(n) is the immediate reward at stage n, and f(n+1) is the maximum total reward from stage n+1 to the final stage.

To gain intuition, think of f(n) as the optimal “future value” — the total reward from the current stage onwards. We want to choose the option at stage n that maximizes this future value. The future value depends on two things:

  1. The immediate reward r(n) at the current stage n. This is the reward we gain right away by choosing an option at stage n.

2. The maximum future value f(n+1) from the next stage n+1 onwards. This represents the total reward we can gain from future stages by choosing an option at the current stage n.

To find the optimal choice at stage n that maximizes future value, we take the maximum of these two quantities: the immediate reward r(n) and the maximum future value from the next stage f(n+1). The option that yields the highest total reward, considering both the immediate and future benefits, is the optimal choice for the current stage.

Do you see how this captures the recursive nature of the problem? The optimal solution for any stage n depends on the optimal solutions for future stages, all the way to the final stage. By starting at the final stage and working backward, we can build up the optimal solution for the full sequence of stages in a step-by-step recursive manner.

Example

To gain a deeper understanding, it often helps to walk through some examples. Imagine you are on a game show where you have to make a series of choices to maximize your winnings.

At each stage of the game, you are given two options:

Option 1: Win $100 and proceed to the next stage

Option 2: Win $50 and go directly to the final stage

If there are 5 stages in total, what sequence of choices would maximize your winnings?

We can use the Bellman equation to solve this. At stage 5 (the final stage), your maximum reward is $0 (f(5) = $0). At stage 4, you have a choice between $100 + $0 (the reward from stage 5) or $50 (the reward for skipping to the final stage). The maximum is $100 + $0 = $100 (f(4) = $100).

At stage 3, your options are $100 + $100 (the reward from stage 4) or $50. The maximum is $100 + $100 = $200 (f(3) = $200). At stage 2, your options are $100 + $200 (the reward from stage 3) or $50. The maximum is $100 + $200 = $300 (f(2) = $300).

Finally, at stage 1, your options are $100 + $300 (the reward from stage 2) or $50. The maximum reward is $100 + $300 = $400 (f(1) = $400).

So the optimal sequence of choices that maximizes your winnings is:

Stage 1: Option 1

Stage 2: Option 1

Stage 3: Option 1

Stage 4: Option 1

Stage 5: Option 1

Your total winnings are $400 by following this path. Do you see how we built up the solution in a recursive manner using the Bellman equation? We started from the final stage and worked backward, choosing the option that maximized the reward at each step.

Another Example

Let’s consider a directed graph with nodes represented by letters and edges represented by lines connecting these nodes. Each edge has a certain weight, which could represent distance, cost, time, etc. The goal is to find the shortest path from a starting node to a destination node

Let’s denote the shortest path from node i to node j as SP(i, j). According to the Bellman equation, SP(i, j) is the minimum of the sum of the cost from i to k and the shortest path from k to j, where k is any node that can be reached directly from i. Mathematically, this can be written as:

This equation is recursive because the shortest path from i to j depends on the shortest paths from other nodes to j.

Now, let’s consider a simple example. Suppose we have four nodes A, B, C, and D. The edges and their weights are as follows:

A -> B: 1

A -> C: 3

B -> D: 2

C -> D: 1

We want to find the shortest path from A to D. We start from the end:

SP(D, D) = 0 (The cost to reach D from D is zero)

Next, we find the shortest paths to D from each node directly connected to D:

SP(B, D) = cost(B, D) = 2

SP(C, D) = cost(C, D) = 1

Then, we find the shortest path from A to D:

= min { 1 + 2, 3 + 1 } = min { 3, 4 } = 3

So, the shortest path from A to D is 3, which goes through B and D. This is a simple example, but the same principle applies to larger and more complex ones. The Bellman equation allows us to break down the problem into smaller subproblems, solve each subproblem, and use those solutions to solve the original problem. This is the essence of dynamic programming.

The Bellman equation is a fascinating concept that comes from the field of dynamic programming. The recursive nature of the Bellman equation beautifully illustrates the mathematical concept of “induction.”

The Principle of Induction is a mathematical technique used to prove a statement, a proposition, or a formula for all natural numbers. It’s like climbing a ladder: if you can get on the first rung (base case), and if you can always reach the next rung from the one you’re on (inductive step), then you can eventually get to any rung on the ladder.

The Bellman equation is recursive, meaning it defines the value of a problem in terms of smaller subproblems. This is similar to the inductive step in the Principle of Induction, where we assume a statement is true for one case and then prove it for the next.

The Bellman equation and the Principle of Induction are two sides same the of coin They. both involve breaking down a complex into problem simpler parts and building up a solution step by step. The Bellman equation provides the framework for this process, while the Principle of Induction provides the mathematical justification for it.

Please feel free to show your support by clapping as much as you can(You can clap up to 50 times per post). Your applause means a lot and helps to spread the message further. Thank you for your generous claps!

--

--

Hossam Hamdy

Computational Statistics Graduate, MSc Financial Engineering candidate, I love Mathematics, Ai, Datascience, Financial engineering!