Java Program to Longest Common Subsequence

AlishaS
JavaToDev
Published in
7 min readApr 11, 2023

The Longest Common Subsequence (LCS) is a sequence of characters that appears in the same order in two or more strings. It is the longest sequence of characters that is present in all the strings, and it is not necessarily contiguous (i.e., the characters can appear in any order within the string). For example, given the strings “abcdef” and “abcfed”, the LCS is “abc” because it appears in both strings and is the longest sequence of characters common to both strings.

LCS is a well-studied problem in computer science and has many applications in fields such as bioinformatics, text editing, and version control. It is often used to compare and align sequences of DNA, protein, or text, and to identify similarities and differences between them.

There are several algorithms for finding the LCS of two or more strings, and they differ in terms of their time and space complexity. Some of the most common algorithms for finding LCS include dynamic programming, divide and conquer, and suffix trees.

In this blog, we will discuss the algorithm for finding LCS and its implementation in Java interview questions. We will also look at some examples of LCS and its applications in various fields. Finally, we will test the Java LCS program and summarize the key points.

Examples and Explanation of LCS: Here are some examples of LCS
1. Given the strings “abcdef” and “abcfed”, the LCS is “abc” because it appears in both strings and is the longest sequence of characters common to both strings.
2. Given the strings “abcdefg” and “abcefgh”, the LCS is “abcef” because it appears in both strings and is the longest sequence of characters common to both strings.
3. Given the strings “abcdef” and “cdefgh”, the LCS is “cdef” because it appears in both strings and is the longest sequence of characters common to both strings.
4. Given the strings “abcdef” and “bcdefg”, the LCS is “bcdef” because it appears in both strings and is the longest sequence of characters common to both strings.
5. Given the strings “abcdef” and “efghij”, the LCS is “” (empty string) because there are no common characters between the two strings.

Applications of LCS: There are several applications of the Longest Common Subsequence (LCS) algorithm in various fields, including:
1. Bioinformatics: LCS is often used to compare and align sequences of DNA, protein, or other biological molecules. It can help identify similarities and differences between the sequences, and it is a key tool in the analysis of genetic data.
2. Text editing: LCS can be used to compare and merge text documents, and to identify changes made to a document over time. This is useful in version control systems, where multiple users may make changes to the same document.
3. Data compression: LCS can be used to compress data by identifying common patterns and replacing them with shorter sequences.
4. Machine learning: LCS can be used in machine learning algorithms to identify patterns in data and to make predictions.
5. Speech recognition: LCS can be used to compare speech patterns and to identify common features in different languages. This can be useful in developing speech recognition systems that can understand multiple languages.
6. Natural language processing: LCS can be used in natural language processing algorithms to identify common patterns in text and to improve the accuracy of language models.
7. Data mining: LCS can be used to identify patterns and trends in large datasets, and to uncover hidden relationships between different variables.

Algorithm for finding LCS:
There are several algorithms for finding the Longest Common Subsequence (LCS) of two or more strings. One of the most common algorithms is dynamic programming, which solves the problem by building up solutions to smaller subproblems.

The basic idea behind the dynamic programming algorithm for LCS is as follows:
1. Initialize a two-dimensional array, where the rows correspond to the characters in the first string and the columns correspond to the characters in the second string.
2. For each character in the first string, compare it to each character in the second string. If the characters are the same, increment the value in the array at that position by 1. If the characters are different, take the maximum value between the value in the array at the current position and the value in the array at the position above and to the left.
3. Repeat step 2 until all characters in both strings have been compared.
The LCS is the maximum value in the array.

Here is the pseudocode for the dynamic programming algorithm for LCS:

function LCS(X[1..m], Y[1..n])
C = array(0..m, 0..n)
for i = 0 to m
C[i,0] = 0
for j = 0 to n
C[0,j] = 0
for i = 1 to m
for j = 1 to n
if X[i] = Y[j]
C[i,j] = C[i-1,j-1] + 1
else
C[i,j] = max(C[i,j-1], C[i-1,j])
return C[m,n]

This algorithm has a time complexity of O(mn), where m and n are the lengths of the two strings. It has a space complexity of O(mn) as well since it requires a two-dimensional array to store the intermediate values.

Other algorithms for finding LCS include divide and conquer and suffix trees. These algorithms have different time and space complexity characteristics, and may be more suitable for different types of problems.

Implementation of LCS in Java:
Here is a sample implementation of the dynamic programming algorithm for finding the Longest Common Subsequence (LCS) in Java:

import java.util.Arrays;
public class LCS {
public static int findLCS(char[] X, char[] Y, int m, int n) {
int[][] L = new int[m+1][n+1];
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
if (i == 0 || j == 0)
L[i][j] = 0;
else if (X[i-1] == Y[j-1])
L[i][j] = L[i-1][j-1] + 1;
else
L[i][j] = Math.max(L[i-1][j], L[i][j-1]);
}
}return L[m][n];
}public static void main(String[] args) {
String s1 = “abcdef”;
String s2 = “abcfed”;char[] X = s1.toCharArray();
char[] Y = s2.toCharArray();
int m = X.length;
int n = Y.length;System.out.println(“LCS length = “ + findLCS(X, Y, m, n));
}
}

In this implementation, the LCS function takes as input two character arrays (representing the two strings) and their lengths, and returns the length of the LCS. It uses a two-dimensional array to store the intermediate values, and it applies the dynamic programming algorithm as described in the pseudocode above.

To test the LCS program, you can create a main method and pass in sample test cases as shown above. The expected output for the test case “abcdef” and “abcfed” is 3, which is the length of the LCS “abc”.
You can also modify the LCS function to return the actual LCS itself, by backtracking through the array and reconstructing the LCS from the stored values.

Testing the Java LCS program:
To test the Java implementation of the Longest Common Subsequence (LCS) algorithm, you can create a main method and define a set of test cases.

Here is an example of how you can test the LCS program:

public static void main(String[] args) {
// Test case 1
String s1 = "abcdef";
String s2 = "abcfed";
char[] X = s1.toCharArray();
char[] Y = s2.toCharArray();
int m = X.length;
int n = Y.length;
System.out.println("LCS length = " + findLCS(X, Y, m, n)); // Expected output: 3
// Test case 2
s1 = "abcdefg";
s2 = "abcefgh";
X = s1.toCharArray();
Y = s2.toCharArray();
m = X.length;
n = Y.length;
System.out.println("LCS length = " + findLCS(X, Y, m, n)); // Expected output: 5
// Test case 3
s1 = "abcdef";
s2 = "cdefgh";
X = s1.toCharArray();
Y = s2.toCharArray();
m = X.length;
n = Y.length;
System.out.println("LCS length = " + findLCS(X, Y, m, n)); // Expected output: 4
// Test case 4
s1 = "abcdef";
s2 = "bcdefg";
X = s1.toCharArray();
Y = s2.toCharArray();
m = X.length;
n = Y.length;
System.out.println("LCS length = " + findLCS(X, Y, m, n)); // Expected output: 5
// Test case 5
s1 = "abcdef";
s2 = "efghij";
X = s1.toCharArray();
Y = s2.toCharArray();
m = X.length;
n = Y.length;
System.out.println("LCS length = " + findLCS(X, Y, m, n)); // Expected output: 0
}

You can define as many test cases as you like, and compare the output of the LCS function to the expected results. This will help you ensure that the LCS program is working correctly and producing the correct output for a variety of input strings.

Conclusion:

The Longest Common Subsequence (LCS) is a sequence of characters that appears in the same order in two or more strings. It is the longest sequence of characters that is present in all the strings, and it is not necessarily contiguous. There are several algorithms for finding LCS, including dynamic programming, divide and conquer, and suffix trees.

In this article, we have looked at the dynamic programming algorithm for LCS and its implementation in Java. We have also discussed some possible applications of LCS and potential future developments.
LCS has a wide range of applications in fields such as bioinformatics, text editing, data compression, machine learning, speech recognition, natural language processing, and data mining. It is a powerful tool for comparing and aligning sequences of characters, and for identifying patterns and trends in data.

In the future, LCS algorithms may continue to be developed and improved, and they may be applied to new and emerging fields. For example, LCS may be used to analyze large datasets in the field of data science or to develop new algorithms for natural language processing and machine learning.

--

--

AlishaS
JavaToDev

I am enthusiastic about programming, and marketing, and constantly seeking new experiences.