Understanding recursion, memoization, and dynamic programming: 3 sides of the same coin

Fabian Terh
The Startup
Published in
6 min readOct 14, 2019

I previously wrote an article on solving the Knapsack Problem with dynamic programming. In that article, I pretty much skipped to the dynamic programming solution directly, with only a brief introduction of what dynamic programming is and when it can be applied.

I came across another dynamic programming problem recently (Edit Distance) and I wanted to explore dynamic programming in greater detail. Particularly, I wanted to explore how exactly dynamic programming relates to recursion and memoization, and what “overlapping subproblems” and “optimal substructure” mean.

The Problem

Given two words word1 and word2, find the minimum number of operations required to convert word1 to word2.

You have the following 3 operations permitted on a word:

  1. Insert a character
  2. Delete a character
  3. Replace a character

Example 1:

Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')

(Problem is copied off LeetCode, and I’ve omitted the rest of the examples. You can find the full problem statement here.)

Solution 1 (naive): Recursion

To solve this problem, we first try to intuitively devise an algorithm, and we add refined details to our algorithm as we go along. (That’s my strategy for problem-solving, and it works!) We don’t know the exact details of the algorithm yet, but at a high level, we know that it should iterate through each character of each string and compare the characters. When we do that, we know there can only be 2 possible outcomes: (1) the characters either match, or (2) they don’t .

In the simplest case, where the characters match, there really isn’t anything to do but to continue the iteration.

If the characters don’t match, this is where the crux of the algorithm lies. This is also where our 3 possible string operations apply: we can insert, delete, or replace a character.

E.g. if we have strings s1=“aa” and s2=“ab”, we would replace the last character of s1. For “aa” and “aab”, we would insert an additional character to s1. And finally, for “aa” and “a”, we would delete the last character of s1.

With these observations, we can write a recursive algorithm that calculates the number of edits for all 3 possible operations and returns the minimum of them.

class Solution:
def minDistance(self, word1: str, word2: str) -> int:
# Our helper function returns the minimum number of edits
# to transform word1 into word2.
def helper(word1, word2, i, j):
if i == 0 or j == 0:
return i + j
last1 = word1[i-1]
last2 = word2[j-1]

if last1 == last2:
return helper(word1, word2, i-1, j-1)
else:
return 1 + min(
helper(word1, word2, i-1, j-1), # replacement
helper(word1, word2, i, j-1), # insertion
helper(word1, word2, i-1, j)) # deletion

return helper(word1, word2, len(word1), len(word2))

To understand how helper(word1, word2, i-1, j-1) relates to a character replacement, and how the other two variants relates to insertion and deletion, you can check out the very informative GeeksforGeeks article on this problem. Briefly put though, we consider a smaller problem space (as with most recursive algorithms) by decrementing i and/or j, depending on the operation.

We also use a nifty trick for optimization. Instead of performing O(N) string slicing operations at each level of our recursive call stack, we pass 2 integers i and j as arguments to represent the substring original_string[0:i].

The naive recursive solution is straightforward but also terribly inefficient, and it times out on LeetCode.

Solution 2: Memoization

To optimize our naive recursive solution, we could use memoization to store results to avoid re-computation. Notice that the 3 recursive calls in our else block could potentially be repeated many times across recursive calls (visualize the recursion tree). We are wasting a lot of time recomputing the same answers to the same set of parameters.

In this case, only i and j are determinant of the result, since word1 and word2 are immutable. Therefore, we only really need to cache the results of combinations of i and j. The same combination would always produce the same result. In my solution, I use the tuple (i, j) as the key in my dictionary.

class Solution:
def minDistance(self, word1: str, word2: str) -> int:
# Our helper function returns the minimum number of edits
# to transform word1 into word2.
def helper(word1, word2, i, j, memo):
if i == 0 or j == 0:
return i + j

if (i, j) in memo:
return memo[(i, j)]

last1 = word1[i-1]
last2 = word2[j-1]
ans = 0

if last1 == last2:
ans = helper(word1, word2, i-1, j-1, memo)
else:
ans = 1 + min(
helper(word1, word2, i-1, j-1, memo), # replace
helper(word1, word2, i, j-1, memo), # insert
helper(word1, word2, i-1, j, memo)) # delete

memo[(i, j)] = ans
return ans

memo = {}
return helper(word1, word2, len(word1), len(word2), memo)

Runtime: 100 ms, faster than 96.03% of Python3 online submissions for Edit Distance.

Solution 3: Dynamic Programming

Let’s now really unpack what the terms “optimal substructure” and “overlapping subproblems” mean.

Overlapping Subproblems

The term “overlapping subproblems” simply means that there are subproblems (of a smaller problem space) that arise repeatedly. For instance, the recursive function fibonacci(10) requires the computation of the subproblems fibonacci(9) and fibonacci(8), but fibonacci(9) also requires the computation of fibonacci(8). Thus, we see that there are overlapping subproblems (i.e. subproblems that arise repeatedly).

In fact, this is the entire basis for memoization, and so if you understand the section above on memoization, you would also have understood what “overlapping subproblems” means.

Dynamic programming (and memoization) works to optimize the naive recursive solution by caching the results to these subproblems. If there are no overlapping subproblems, there is no point caching these results, since we will never use them again. For instance, recursive binary search has no overlapping subproblems, and so memoization is useless.

Optimal Substructure

I don’t think I can phrase this better than GeeksforGeeks, so I’ll just rephrase their definition:

A given problem has optimal substructure property if the optimal solution of the given problem can be obtained by using the optimal solutions of its subproblems.

In this case, we can observe that the Edit Distance problem has optimal substructure property, because at each level of our recursive tree, we want to calculate and return the minimum of 3 recursive calls (assuming that the characters differ, of course).

Therefore, we can “work our way upwards”, by incrementally computing the optimal solutions to subproblems, until we arrive at the optimal solution to our given problem.

The Solution

We create a table of size m+1 by n+1, where m and n are the lengths of word1 and word2 respectively. (We offset the lengths by 1 to account for our base cases of an empty string.)

Therefore, in our dynamic programming solution, the value at table[row][col] represents the minimum edit distance required to transform substring word1[:row] to word2[:col].

class Solution:
def minDistance(self, word1: str, word2: str) -> int:
if not word1 or not word2:
return len(word1) + len(word2)

m, n = len(word1), len(word2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Prepopulate table for base cases
for i in range(len(dp[0])):
dp[0][i] = i
for i in range(len(dp)):
dp[i][0] = i
# Build table
for i in range(1, m + 1):
for j in range(1, n + 1):
c1, c2 = word1[i-1], word2[j-1]
if c1 == c2:
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = 1 + min(dp[i-1][j-1], # replace
dp[i-1][j], # delete
dp[i][j-1]) # insert

return dp[-1][-1]

Runtime: 184 ms, faster than 62.60% of Python3 online submissions for Edit Distance.

Memoization vs Dynamic Programming

In fact, memoization and dynamic programming are extremely similar. One way to think about it is that memoization is top-down (you recurse from the top but with caching), while dynamic programming is bottom-up (you build the table incrementally). Some sources, in fact, classify both as variants of dynamic programming.

The key takeaway is that they perform similar functions, which is to avoid unnecessary and expensive recalculations of subproblems. This greatly increases the run-time efficiency of many algorithms, such as the classic counting change problem (to which this post title is a reference to).

--

--