Skip to main content

Edit Distance

Problem

This is a classic problem of dynamic programming to match 2 strings given a set of operations.

Problem Description

Given 2 strings x=[x1,x2,...,xm]x = [x_1, x_2, ..., x_m] and y=[y1,y2,...,yn]y = [y_1, y_2, ..., y_n] and a set of operations:

  • Insert a character
  • Delete a character
  • Replace a character

Find the minimum number of operations to make xx and yy equal.

Given:

  • 0 <= m, n
    • m = 0 => x = ""
    • n = 0 => y = ""
  • x and y consist of English letters and digits.

Testcases

Input: x = "sunny", y = "snowy"
Output: 3

The operations are:

  1. sunny -> snny (delete 'u')
  2. snny -> snoy (replace 'n' with 'o')
  3. snoy -> snowy (insert 'w')
Input: x = "intention", y = "execution"
Output: 5

The operations are:

  1. intention -> inention (delete 't')
  2. inention -> enention (replace 'i' with 'e')
  3. enention -> exention (replace 'n' with 'x')
  4. exention -> exection (replace 'n' with 'c')
  5. exection -> execution (insert 'u')

Solution

Subproblems

The subproblems depends on what operation we applied to the previous characters.

And apart from the 3 operations given, we must not overlook the possibility of not applying any operation.

With this in mind, there are a total of 7 subproblems when we are at xmx_m and yny_n:

  • From xm1x_{m-1} and yn1y_{n-1}, apply no operation.
  • From xm1x_{m-1} and yny_n, insert a character in front of yny_n to match xm1x_{m-1}.
  • From xmx_m and yn1y_{n-1}, insert a character in front of xmx_m to match yn1y_{n-1}.
  • From xm1x_{m-1} and yny_n, delete xm1x_{m-1} to avoid mismatch with yny_n.
  • From xmx_m and yn1y_{n-1}, delete yn1y_{n-1} to avoid mismatch with xmx_m.
  • At xm1x_{m-1} and yn1y_{n-1}, replace xm1x_{m-1} to match yn1y_{n-1}.
  • At xm1x_{m-1} and yn1y_{n-1}, replace yn1y_{n-1} to match xm1x_{m-1}.

But these are not what we are looking for. For example, replacing xm1x_{m-1} to match yn1y_{n-1}, and replacing yn1y_{n-1} to match xm1x_{m-1} are actually the same.

Therefore, the actual subproblems are the following 3:

  • From xm1x_{m-1} and yn1y_{n-1}, either:
    • Apply no operation.
    • Replace one to match the other.
  • From xm1x_{m-1} and yny_n, either:
    • Insert a character in front of yny_n to match xm1x_{m-1}.
    • Delete xm1x_{m-1} to avoid mismatch with yny_n.
  • From xmx_m and yn1y_{n-1}, either:
    • Insert a character in front of xmx_m to match yn1y_{n-1}.
    • Delete yn1y_{n-1} to avoid mismatch with xmx_m.
Match One String to The Other

Take a closer look at the subproblems, we can see that in fact we can choose to always apply operations to only one of the strings to match the others.

Recurrence Relation

The relation is a bit more complicated in this problem, because we cannot apply no operation to the subproblem if xm1x_{m-1} and yn1y_{n-1} are not matched.

We have to find the minimum of the three operations if they are not matching, and increase the operation count by 1.

As a result, let f(m,n)f(m, n) be the minimum number of operations to make [x1,x2,...,xm][x_1, x_2, ..., x_m] and [y1,y2,...,yn][y_1, y_2, ..., y_n] equal:

f(m,n)={f(m1,n1)if xm1=yn11+min{f(m1,n)f(m,n1)f(m1,n1)otherwisef(m, n) = \begin{cases} f(m - 1, n - 1) & \text{if } x_{m-1} = y_{n-1} \\ 1 + \min \begin{cases} f(m - 1, n) \\ f(m, n - 1) \\ f(m - 1, n - 1) \end{cases} & \text{otherwise} \end{cases}

Base Cases

The base case is simpiy when xx and yy are empty strings:

f(0,0)=0f(0, 0) = 0

Edge Cases

The edge cases happen when one of the string is empty, some of the recursive cases will be invalid.

In this case, we will either delete all of the extra characters of the non-empty string, or insert all of The non-empty string's character to the empty string.

f(m,0)=mf(0,n)=n\begin{aligned} f(m, 0) &= m \\ f(0, n) &= n \end{aligned}

Solving the Problem

We only need to implement the solution according to the recurrence relation.

Edit Distance
function edit_distance(x: str, y: str) -> int {
// the problem sizes
let m = x.length
let n = y.length

// initialize memoization array
let dp = [...] of size (m + 1, n + 1)
zero-based index

// calculate values
for i from 0 to m {
for j from 0 to n {
dp[i][j] = match (i, j) {
// base case
(0, 0) => 0,

// edge cases
(0, j) => j,
(i, 0) => i,

// recursive cases
(i, j) if x[i - 1] == y[j - 1] => dp[i - 1][j - 1],
(i, j) => 1 + min(
dp[i - 1][j],
dp[i][j - 1],
dp[i - 1][j - 1],
),
}
}
}

return dp[m][n]
}

Time complexity: O(mn)O(mn)

Space complexity: O(mn)O(mn)

The solution to the edit distance problem with x=sunnyx = \text{sunny} and y=snowyy = \text{snowy}.

Try it out

x=x =\:

, y=y =\:

snowy
012345
s101234
u211234
n321234
n432234
y543333

Green bold values form the minimum path of edit, and underlined values are where operations are applied.

Operations:

  • Delete u
  • Replace n with o
  • Insert w

Space Optimization

In comparison to the unique paths to corner problem, which subproblems only consist of the top value and the left value

This problem requires the top value, the left value, and the top left value.

If we use the 1D array like the unique paths to corner problem, the top left value will already overwritten by the left value.

So the solution is to use another variable to store the top value of current value for the right value to as we move to the right.

Edit Distance - Space Optimized
function edit_distance(x: str, y: str) -> int {
// the problem sizes
let m = x.length
let n = y.length

// initialize memoization array
let dp = [...] of size n + 1
zero-based index
initialized to 0

// top left variable
let top_left = 0

// calculate values
for i from 0 to m {
for j from 0 to n {
// store the top left value
let new_top_left = dp[j]

dp[j] = match (i, j) {
// base case
(0, 0) => 0,

// edge cases
(0, j) => j,
(i, 0) => i,

// recursive cases
(i, j) if x[i - 1] == y[j - 1] => top_left,
(i, j) => 1 + min(
dp[j],
dp[j - 1],
top_left,
),
}

// update the top left value
top_left = new_top_left
}
}

return dp[n]
}

Time complexity: O(mn)O(mn)

Space complexity: O(n)O(n)

The space optimized solution to the edit distance problem with x=sunnyx = \text{sunny} and y=snowyy = \text{snowy}.

Checkpoint

Which of the following statements are true about the edit distance problem?

If there are n strings, the time complexity and space comlpexity will remain the same
There are 7 subproblems
There can be multiple solutions to the same problem
Both strings have to be editable to solve the problem

Implementation

Edit Distance
def editDistance(x, y):
# the problem sizes
m = len(x)
n = len(y)

# initialize memoization array
dp = [[0] * (n + 1) for _ in range(m + 1)]

# calculate values
for i in range(m + 1):
for j in range(n + 1):
if i == 0 and j == 0:
# base case
dp[i][j] = 0
elif i == 0:
# edge case
dp[i][j] = j
elif j == 0:
# edge case
dp[i][j] = i
elif x[i - 1] == y[j - 1]:
# recursive case
dp[i][j] = dp[i - 1][j - 1]
else:
# recursive case
dp[i][j] = 1 + min(
dp[i - 1][j],
dp[i][j - 1],
dp[i - 1][j - 1],
)

return dp[m][n]
Edit Distance - Space Optimized
def editDistance(x, y):
# the problem sizes
m = len(x)
n = len(y)

# initialize memoization array
dp = [0] * (n + 1)

# top left variable
top_left = 0

# calculate values
for i in range(m + 1):
for j in range(n + 1):
# store the top left value
new_top_left = dp[j]

if i == 0 and j == 0:
# base case
dp[j] = 0
elif i == 0:
# edge case
dp[j] = j
elif j == 0:
# edge case
dp[j] = i
elif x[i - 1] == y[j - 1]:
# recursive case
dp[j] = top_left
else:
# recursive case
dp[j] = 1 + min(
dp[j],
dp[j - 1],
top_left,
)

# update the top left value
top_left = new_top_left

return dp[n]

LeetCode - Edit Distance: https://leetcode.com/problems/edit-distance/