Longest Common Subsequence: A Dynamic Programming Problem

Aaishwarya Kulkarni
4 min readJul 15, 2024

--

The Longest Common Subsequence (LCS) is a fundamental problem in computer science that showcases the effectiveness of dynamic programming.

Why Use Dynamic Programming for the LCS Problem?

The LCS problem entails addressing overlapping subproblems, meaning the same subproblems are solved multiple times. Dynamic programming stores the results of these subproblems in a table to avoid redundant computations.

To determine the LCS of the strings text1 = ‘abcde’ and text2 = ‘ace’:

  1. We start by initialising a 2D matrix of size (t1+1) x (t2+1) where t1 and t2 are the lengths of text1 and text2 respectively. The matrix is filled in a bottom-up manner, starting from the end of both strings and moving towards the beginning.
  2. For each character pair (text1[i], text2[j]), if they match, the LCS length is 1 + matrix[i+1][j+1]. The 1 accounts for the matching character and the matrix[i + 1][j + 1] represents the length of the LCS for the substrings text1[i: ] and text2[j: ].
  3. If they do not match, the LCS length is the maximum of ignoring either character. i.e. max(matrix[i][j + 1], matrix[i + 1][j])
    matrix[i][j + 1]: The LCS length if we ignore text2[j].
    matrix[i + 1][j]: The LCS length if we ignore text1[i].

This can be confusing, so let’s see a few examples for better understanding!

Fig. 1

As text1[4] matches text2[2], the length of the LCS at matrix[4][2] is calculated as:
matrix[4][2] = 1 + matrix[5][3]

1 for the matched character and matrix[5][3] represents the length of the remaining substrings (text1[5:] and text2[3:]). For text1[5:] and text2[3:], we have 0 characters matched as they are the starting points of the remaining substrings. Therefore LCS at matrix[4][2] is 1.

Fig. 2

As text1[4] does not match text2[1], the length of the LCS at matrix[4][1] is calculated as:
matrix[4][1] = max(matrix[4][2], matrix[5][1])

matrix[4][2] represents the LCS length if we ignore text2[1].
matrix[5][1] represents the LCS length if we ignore text1[4].

Therefore LCS at matrix[4][1] is 1.

Fig. 3

As text1[2] matches text2[1], the length of the LCS at matrix[2][1] is calculated as
matrix[2][1] = 1 + matrix[3][2]

1 for the matched character and matrix[3][2] represents the LCS length of the remaining substrings (text1[3:] and text2[2:]). Where, for text1[3:] and text2[2:] we have 1 character matched i.e. ‘e’. Therefore LCS at matrix[2][1] is 2.

Fig. 4

As text1[0] matches text2[0], the length of the LCS at matrix[0]0] is calculated as
matrix[0][0] = 1 + matrix[1][1]

1 for the matched character and matrix[1][1] represents the LCS length of the remaining substrings (text1[1:] and text2[1:]). Where, for text1[1:] and text2[1:] we have 2 characters matched i.e. ‘c’ and ‘e’. Therefore LCS at matrix[0][0] is of length 3.

Hence, the length of the longest common subsequence for text1 = ‘abcde’ and text2 = ‘ace’ is 3, which can be found at matrix[0][0].

We have the length, but we can also obtain the actual LCS string if needed!

To retrieve the longest common subsequence (LCS) string from the calculated matrix, you can trace back from matrix[0][0] to the end of the matrix.

  1. If text1[i] == text2[j] i.e. characters are matching, we move diagonally to matrix[i+1][j+1] (represented by black arrows).
  2. If text1[i] != text2[j], move to the cell that gave the maximum value:
  3. If matrix[i][j+1] > matrix[i+1][j], move right to matrix[i][j+1]. Otherwise, move down to matrix[i+1][j] (represented by red arrows).
  4. Repeat until you reach the end of the matrix.

Python Implementation

class Solution(object):
def longestCommonSubsequence(self, text1, text2):

t1 = len(text1)
t2 = len(text2)

# Step 1: Create and fill the matrix (bottom-up)
matrix = [[0 for j in range(t2 + 1)] for i in range(t1 + 1)]

for i in range(len(text1) - 1, -1, -1):
for j in range(len(text2) - 1, -1, -1):

if text1[i] == text2[j]:
matrix[i][j] = 1 + matrix[i + 1][j + 1]
else:
matrix[i][j] = max(matrix[i][j + 1],matrix[i + 1][j])

# Step 2: Trace back to get the LCS string (top-down)
i, j = 0, 0
lcs = []
while i < t1 and j < t2:
if text1[i] == text2[j]:
lcs.append(text1[i])
i += 1
j += 1
elif matrix[i][j + 1] > matrix[i + 1][j]:
j += 1
else:
i += 1

return ''.join(lcs)

Time complexity Analysis

  1. Creating a 2D matrix of size (t1 + 1) x (t2 + 1) requires O(t1 * t2) time.
  2. The nested loops iterate over each cell of the matrix once where comparison and assignment requires O(1) time. Therefore filling the matrix takes O(t1 * t2) time.
  3. Tracing back to construct the LCS string (if needed) requires at most t1 + t2 steps as we either move down or right at each step, which comes to O(t1 + t2) time.
  4. Thus, the overall time complexity of the algorithm is O(t1 * t2).

Thanks for reading! Happy Coding!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response