# Longest Common Subsequence

Published in

--

Write a function to find the length of the longest common subsequence between two sequences.

E.g. Given the strings “serendipitous” and “precipitation”, the longest common subsequence is “reipito” and its length is 7.

Recursive approach

1. Create two counters `idx1` and `idx2` starting at 0. Our recursive function will compute the LCS of `seq1[idx1:]` and `seq2[idx2:]`
2. If `seq1[idx1]` and `seq2[idx2]` are equal, then this character belongs to the LCS of `seq1[idx1:]` and `seq2[idx2:]` (why?). Further the length this is LCS is one more than LCS of `seq1[idx1+1:]` and `seq2[idx2+1:]`

3. If not, then the LCS of `seq1[idx1:]` and `seq2[idx2:]` is the longer one among the LCS of `seq1[idx1+1:], seq2[idx2:]` and the LCS of `seq1[idx1:]`, `seq2[idx2+1:]`

4. If either `seq1[idx1:]` or `seq2[idx2:]` is empty, then their LCS is empty.

Here’s what the tree of recursive calls looks like:

`def lcs_recursive(seq1, seq2, idx1=0, idx2=0):    if idx1 == len(seq1) or idx2 == len(seq2):        return 0    elif seq1[idx1] == seq2[idx2]:       return 1 + lcs_recursive(seq1, seq2, idx1+1, idx2+1)    else:       option1 = lcs_recursive(seq1, seq2, idx1+1, idx2)       option2 = lcs_recursive(seq1, seq2, idx1, idx2+1)       return max(option1, option2)`

Algorithm complexity and inefficiency →

Worst case occurs when each time we have to try 2 subproblems i.e. when the sequences have no common elements.

Time complexity is O(2^(m+n)).

Its overcome by using a technique called memoization (i.e. memorization). It is used to store intermediate repetitive results for future reference.

`def lcs_memo(seq1, seq2):    memo = {}    def recurse(idx1=0, idx2=0):        key = (idx1, idx2)        if key in memo:            return memo[key]        elif idx1 == len(seq1) or idx2 == len(seq2):            memo[key] = 0        elif seq1[idx1] == seq2[idx2]:            memo[key] = return 1 + recurse(idx1+1, idx2+1)        else:            memo[key] = max(recurse(idx1+1, idx2), recurse(idx1, idx2+1))        return memo[key]    return recurse(0, 0)`

But, use of recursion, increases the space complexity because, for each call space needs to be allocated and the process continues. Hence, the solution to this issue is iterative approach, which can be done by:

Dynamic Programming →

1. Create a table of size `(n1+1) * (n2+1)` initialized with 0s, where `n1` and `n2` are the lengths of the sequences. `table[i][j]` represents the longest common subsequence of `seq1[:i]` and `seq2[:j]`. Here's what the table looks like (source: Kevin Mavani, Medium).

2. If `seq1[i]` and `seq2[j]` are equal, then `table[i+1][j+1] = 1 + table[i][j]`

3. If `seq1[i]` and `seq2[j]` are equal, then `table[i+1][j+1] = max(table[i][j+1], table[i+1][j])`

`def lcs_dp(seq1, seq2):    n1, n2 = len(seq1), len(seq2)    table = [[0 for x in range(n2)] for x in range(n1)]    for idx1 in range(n1):        for idx2 in range(n2):            if seq1[idx1] == seq2[idx2]:                table[i+1][j+1] = 1 + table[i][j]            else:                table[i+1][j+1] = max(table[i][j+1], table[i+1][j])    return table[-1][-1]`

For this approach… Time complexity is O(N1 * N2).

--

--

MSIS @ Northeastern University | Data Science Enthusiast