Longest Common Subsequence

Aditi Deodhar
Analytics Vidhya
Published in
3 min readMar 3, 2021

Write a function to find the length of the longest common subsequence between two sequences.

E.g. Given the strings “serendipitous” and “precipitation”, the longest common subsequence is “reipito” and its length is 7.

Recursive approach

  1. Create two counters idx1 and idx2 starting at 0. Our recursive function will compute the LCS of seq1[idx1:] and seq2[idx2:]
  2. If seq1[idx1] and seq2[idx2] are equal, then this character belongs to the LCS of seq1[idx1:] and seq2[idx2:] (why?). Further the length this is LCS is one more than LCS of seq1[idx1+1:] and seq2[idx2+1:]

3. If not, then the LCS of seq1[idx1:] and seq2[idx2:] is the longer one among the LCS of seq1[idx1+1:], seq2[idx2:] and the LCS of seq1[idx1:], seq2[idx2+1:]

4. If either seq1[idx1:] or seq2[idx2:] is empty, then their LCS is empty.

Here’s what the tree of recursive calls looks like:

def lcs_recursive(seq1, seq2, idx1=0, idx2=0):
if idx1 == len(seq1) or idx2 == len(seq2):
return 0
elif seq1[idx1] == seq2[idx2]:
return 1 + lcs_recursive(seq1, seq2, idx1+1, idx2+1)
else:
option1 = lcs_recursive(seq1, seq2, idx1+1, idx2)
option2 = lcs_recursive(seq1, seq2, idx1, idx2+1)
return max(option1, option2)

Algorithm complexity and inefficiency →

Worst case occurs when each time we have to try 2 subproblems i.e. when the sequences have no common elements.

Time complexity is O(2^(m+n)).

Its overcome by using a technique called memoization (i.e. memorization). It is used to store intermediate repetitive results for future reference.

def lcs_memo(seq1, seq2):
memo = {}
def recurse(idx1=0, idx2=0):
key = (idx1, idx2)
if key in memo:
return memo[key]
elif idx1 == len(seq1) or idx2 == len(seq2):
memo[key] = 0
elif seq1[idx1] == seq2[idx2]:
memo[key] = return 1 + recurse(idx1+1, idx2+1)
else:
memo[key] = max(recurse(idx1+1, idx2), recurse(idx1, idx2+1))
return memo[key]
return recurse(0, 0)

But, use of recursion, increases the space complexity because, for each call space needs to be allocated and the process continues. Hence, the solution to this issue is iterative approach, which can be done by:

Dynamic Programming →

  1. Create a table of size (n1+1) * (n2+1) initialized with 0s, where n1 and n2 are the lengths of the sequences. table[i][j] represents the longest common subsequence of seq1[:i] and seq2[:j]. Here's what the table looks like (source: Kevin Mavani, Medium).

2. If seq1[i] and seq2[j] are equal, then table[i+1][j+1] = 1 + table[i][j]

3. If seq1[i] and seq2[j] are equal, then table[i+1][j+1] = max(table[i][j+1], table[i+1][j])

def lcs_dp(seq1, seq2):
n1, n2 = len(seq1), len(seq2)
table = [[0 for x in range(n2)] for x in range(n1)]
for idx1 in range(n1):
for idx2 in range(n2):
if seq1[idx1] == seq2[idx2]:
table[i+1][j+1] = 1 + table[i][j]
else:
table[i+1][j+1] = max(table[i][j+1], table[i+1][j])
return table[-1][-1]

For this approach… Time complexity is O(N1 * N2).

--

--

Aditi Deodhar
Analytics Vidhya

MSIS @ Northeastern University | Data Science Enthusiast