Longest Common Subsequence
Write a function to find the length of the longest common subsequence between two sequences.
E.g. Given the strings “serendipitous” and “precipitation”, the longest common subsequence is “reipito” and its length is 7.
Recursive approach →
- Create two counters
idx1
andidx2
starting at 0. Our recursive function will compute the LCS ofseq1[idx1:]
andseq2[idx2:]
- If
seq1[idx1]
andseq2[idx2]
are equal, then this character belongs to the LCS ofseq1[idx1:]
andseq2[idx2:]
(why?). Further the length this is LCS is one more than LCS ofseq1[idx1+1:]
andseq2[idx2+1:]
3. If not, then the LCS of seq1[idx1:]
and seq2[idx2:]
is the longer one among the LCS of seq1[idx1+1:], seq2[idx2:]
and the LCS of seq1[idx1:]
, seq2[idx2+1:]
4. If either seq1[idx1:]
or seq2[idx2:]
is empty, then their LCS is empty.
Here’s what the tree of recursive calls looks like:
def lcs_recursive(seq1, seq2, idx1=0, idx2=0):
if idx1 == len(seq1) or idx2 == len(seq2):
return 0
elif seq1[idx1] == seq2[idx2]:
return 1 + lcs_recursive(seq1, seq2, idx1+1, idx2+1)
else:
option1 = lcs_recursive(seq1, seq2, idx1+1, idx2)
option2 = lcs_recursive(seq1, seq2, idx1, idx2+1)
return max(option1, option2)
Algorithm complexity and inefficiency →
Worst case occurs when each time we have to try 2 subproblems i.e. when the sequences have no common elements.
Time complexity is O(2^(m+n)).
Its overcome by using a technique called memoization (i.e. memorization). It is used to store intermediate repetitive results for future reference.
def lcs_memo(seq1, seq2):
memo = {}
def recurse(idx1=0, idx2=0):
key = (idx1, idx2)
if key in memo:
return memo[key]
elif idx1 == len(seq1) or idx2 == len(seq2):
memo[key] = 0
elif seq1[idx1] == seq2[idx2]:
memo[key] = return 1 + recurse(idx1+1, idx2+1)
else:
memo[key] = max(recurse(idx1+1, idx2), recurse(idx1, idx2+1))
return memo[key]
return recurse(0, 0)
But, use of recursion, increases the space complexity because, for each call space needs to be allocated and the process continues. Hence, the solution to this issue is iterative approach, which can be done by:
Dynamic Programming →
- Create a table of size
(n1+1) * (n2+1)
initialized with 0s, wheren1
andn2
are the lengths of the sequences.table[i][j]
represents the longest common subsequence ofseq1[:i]
andseq2[:j]
. Here's what the table looks like (source: Kevin Mavani, Medium).
2. If seq1[i]
and seq2[j]
are equal, then table[i+1][j+1] = 1 + table[i][j]
3. If seq1[i]
and seq2[j]
are equal, then table[i+1][j+1] = max(table[i][j+1], table[i+1][j])
def lcs_dp(seq1, seq2):
n1, n2 = len(seq1), len(seq2)
table = [[0 for x in range(n2)] for x in range(n1)]
for idx1 in range(n1):
for idx2 in range(n2):
if seq1[idx1] == seq2[idx2]:
table[i+1][j+1] = 1 + table[i][j]
else:
table[i+1][j+1] = max(table[i][j+1], table[i+1][j])
return table[-1][-1]
For this approach… Time complexity is O(N1 * N2).