Lukas' Notes

computation

Definition

Longest Common Subsequence Problem

Given two strings and over an alphabet . The longest common subsequence problem asks for the length of a longest string that is a subsequence of both and .

Input:

Brute-Force Algorithm

Brute-Force Algorithm

A brute-force approach enumerates all subsequences of , checks for each whether it is also a subsequence of , and keeps the maximum length found.

int lcs_bruteforce(String S, String T)
    int best := 0
    for each subsequence A of S:
        if is_subsequence(A, T):
            best := max(best, |A|)
    return best

Checking whether is a subsequence of takes time by scanning left to right.

Time complexity: where and .

Recursive Algorithm

Recursive Dynamic Programme

Use zero-based indices. For a string , is the letter at index , and is the substring from index to index , including both endpoints.

Let be the length of an LCS of the prefixes and .

If one prefix is empty, no non-empty common subsequence is possible:

Now consider an LCS of and .

  • If ends with and , then the last letters can be matched. This gives .
  • If ends with but , then is not used. This case is covered by .
  • If does not end with , then is not used. This case is covered by .

In the matching case, taking the matching last letters dominates dropping one of them. Hence the recurrence simplifies to:

int lcs_rec(int i, int j)
    if i < 0 or j < 0:
        return 0
    if memo[i][j] is known:
        return memo[i][j]
    if S[i] == T[j]:
        memo[i][j] := lcs_rec(i-1, j-1) + 1
    else:
        memo[i][j] := max(lcs_rec(i, j-1), lcs_rec(i-1, j))
    return memo[i][j]

Correctness follows from the case split above. The diagram shows the two behaviours of a recursive call.

With memoisation or bottom-up table filling, there are values to compute, and each value takes work after its dependencies are known.

Time complexity: . Without memoisation, the recursion tree can be exponential because the same subproblems reappear.

Examples

The longest common subsequence is with length .

Both and are longest common subsequences, each of length .

The longest common subsequence is with length .

No character appears in both strings. The only common subsequence is the empty string , so .