Edit Distance
Problem
This is a classic problem of dynamic programming to match 2 strings given a set of operations.
Problem Description
Given 2 strings and and a set of operations:
- Insert a character
- Delete a character
- Replace a character
Find the minimum number of operations to make and equal.
Given:
0 <= m, n
m = 0 => x = ""
n = 0 => y = ""
x
andy
consist of English letters and digits.
Testcases
Input: x = "sunny", y = "snowy"
Output: 3
The operations are:
- sunny -> snny (delete 'u')
- snny -> snoy (replace 'n' with 'o')
- snoy -> snowy (insert 'w')
Input: x = "intention", y = "execution"
Output: 5
The operations are:
- intention -> inention (delete 't')
- inention -> enention (replace 'i' with 'e')
- enention -> exention (replace 'n' with 'x')
- exention -> exection (replace 'n' with 'c')
- exection -> execution (insert 'u')
Solution
Subproblems
The subproblems depends on what operation we applied to the previous characters.
And apart from the 3 operations given, we must not overlook the possibility of not applying any operation.
With this in mind, there are a total of 7 subproblems when we are at and :
- From and , apply no operation.
- From and , insert a character in front of to match .
- From and , insert a character in front of to match .
- From and , delete to avoid mismatch with .
- From and , delete to avoid mismatch with .
- At and , replace to match .
- At and , replace to match .
But these are not what we are looking for. For example, replacing to match , and replacing to match are actually the same.
Therefore, the actual subproblems are the following 3:
- From and , either:
- Apply no operation.
- Replace one to match the other.
- From and , either:
- Insert a character in front of to match .
- Delete to avoid mismatch with .
- From and , either:
- Insert a character in front of to match .
- Delete to avoid mismatch with .
Take a closer look at the subproblems, we can see that in fact we can choose to always apply operations to only one of the strings to match the others.
Recurrence Relation
The relation is a bit more complicated in this problem, because we cannot apply no operation to the subproblem if and are not matched.
We have to find the minimum of the three operations if they are not matching, and increase the operation count by 1.
As a result, let be the minimum number of operations to make and equal:
Base Cases
The base case is simpiy when and are empty strings:
Edge Cases
The edge cases happen when one of the string is empty, some of the recursive cases will be invalid.
In this case, we will either delete all of the extra characters of the non-empty string, or insert all of The non-empty string's character to the empty string.
Solving the Problem
We only need to implement the solution according to the recurrence relation.
function edit_distance(x: str, y: str) -> int {
// the problem sizes
let m = x.length
let n = y.length
// initialize memoization array
let dp = [...] of size (m + 1, n + 1)
zero-based index
// calculate values
for i from 0 to m {
for j from 0 to n {
dp[i][j] = match (i, j) {
// base case
(0, 0) => 0,
// edge cases
(0, j) => j,
(i, 0) => i,
// recursive cases
(i, j) if x[i - 1] == y[j - 1] => dp[i - 1][j - 1],
(i, j) => 1 + min(
dp[i - 1][j],
dp[i][j - 1],
dp[i - 1][j - 1],
),
}
}
}
return dp[m][n]
}
Time complexity:
Space complexity:
The solution to the edit distance problem with and .
,
s | n | o | w | y | ||
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | |
s | 1 | 0 | 1 | 2 | 3 | 4 |
u | 2 | 1 | 1 | 2 | 3 | 4 |
n | 3 | 2 | 1 | 2 | 3 | 4 |
n | 4 | 3 | 2 | 2 | 3 | 4 |
y | 5 | 4 | 3 | 3 | 3 | 3 |
Green bold values form the minimum path of edit, and underlined values are where operations are applied.
Operations:
- Delete u
- Replace n with o
- Insert w
Space Optimization
In comparison to the unique paths to corner problem, which subproblems only consist of the top value and the left value
This problem requires the top value, the left value, and the top left value.
If we use the 1D array like the unique paths to corner problem, the top left value will already overwritten by the left value.
So the solution is to use another variable to store the top value of current value for the right value to as we move to the right.
function edit_distance(x: str, y: str) -> int {
// the problem sizes
let m = x.length
let n = y.length
// initialize memoization array
let dp = [...] of size n + 1
zero-based index
initialized to 0
// top left variable
let top_left = 0
// calculate values
for i from 0 to m {
for j from 0 to n {
// store the top left value
let new_top_left = dp[j]
dp[j] = match (i, j) {
// base case
(0, 0) => 0,
// edge cases
(0, j) => j,
(i, 0) => i,
// recursive cases
(i, j) if x[i - 1] == y[j - 1] => top_left,
(i, j) => 1 + min(
dp[j],
dp[j - 1],
top_left,
),
}
// update the top left value
top_left = new_top_left
}
}
return dp[n]
}
Time complexity:
Space complexity:
The space optimized solution to the edit distance problem with and .
Which of the following statements are true about the edit distance problem?
Implementation
- Python
- C++
- Rust
def editDistance(x, y):
# the problem sizes
m = len(x)
n = len(y)
# initialize memoization array
dp = [[0] * (n + 1) for _ in range(m + 1)]
# calculate values
for i in range(m + 1):
for j in range(n + 1):
if i == 0 and j == 0:
# base case
dp[i][j] = 0
elif i == 0:
# edge case
dp[i][j] = j
elif j == 0:
# edge case
dp[i][j] = i
elif x[i - 1] == y[j - 1]:
# recursive case
dp[i][j] = dp[i - 1][j - 1]
else:
# recursive case
dp[i][j] = 1 + min(
dp[i - 1][j],
dp[i][j - 1],
dp[i - 1][j - 1],
)
return dp[m][n]
def editDistance(x, y):
# the problem sizes
m = len(x)
n = len(y)
# initialize memoization array
dp = [0] * (n + 1)
# top left variable
top_left = 0
# calculate values
for i in range(m + 1):
for j in range(n + 1):
# store the top left value
new_top_left = dp[j]
if i == 0 and j == 0:
# base case
dp[j] = 0
elif i == 0:
# edge case
dp[j] = j
elif j == 0:
# edge case
dp[j] = i
elif x[i - 1] == y[j - 1]:
# recursive case
dp[j] = top_left
else:
# recursive case
dp[j] = 1 + min(
dp[j],
dp[j - 1],
top_left,
)
# update the top left value
top_left = new_top_left
return dp[n]
#include <string>
#include <vector>
#include <algorithm>
int edit_distance(const std::string& x, const std::string& y) {
// the problem sizes
std::size_t m = x.size();
std::size_t n = y.size();
// initialize memoization array
std::vector<std::vector<int>> dp(
m + 1,
std::vector<int>(n + 1, 0)
);
// calculate values
for (std::size_t i = 0; i <= m; ++i) {
for (std::size_t j = 0; j <= n; ++j) {
if (i == 0 && j == 0) {
// base case
dp[i][j] = 0;
} else if (i == 0) {
// edge case
dp[i][j] = j;
} else if (j == 0) {
// edge case
dp[i][j] = i;
} else if (x[i - 1] == y[j - 1]) {
// recursive case
dp[i][j] = dp[i - 1][j - 1];
} else {
// recursive case
dp[i][j] = 1 + std::min({
dp[i - 1][j],
dp[i][j - 1],
dp[i - 1][j - 1],
});
}
}
}
return dp[m][n];
}
#include <string>
#include <vector>
#include <algorithm>
int edit_distance(const std::string& x, const std::string& y) {
// the problem sizes
std::size_t m = x.size();
std::size_t n = y.size();
// initialize memoization array
std::vector<int> dp(n + 1, 0);
// top left variable
int top_left = 0;
// calculate values
for (std::size_t i = 0; i <= m; ++i) {
for (std::size_t j = 0; j <= n; ++j) {
// store the top left value
int new_top_left = dp[j];
if (i == 0 && j == 0) {
// base case
dp[j] = 0;
} else if (i == 0) {
// edge case
dp[j] = j;
} else if (j == 0) {
// edge case
dp[j] = i;
} else if (x[i - 1] == y[j - 1]) {
// recursive case
dp[j] = top_left;
} else {
// recursive case
dp[j] = 1 + std::min({
dp[j],
dp[j - 1],
top_left,
});
}
// update the top left value
top_left = new_top_left;
}
}
return dp[n];
}
fn edit_distance(x: &str, y: &str) -> usize {
// the problem sizes
let m = x.len();
let n = y.len();
// initialize memoization array
let mut dp = vec![vec![0; n + 1]; m + 1];
// calculate values
for i in 0..=m {
for j in 0..=n {
dp[i][j] = match (i, j) {
// base case
(0, 0) => 0,
// edge cases
(0, j) => j,
(i, 0) => i,
// recursive cases
(i, j) if x.as_bytes()[i - 1] == y.as_bytes()[j - 1] => dp[i - 1][j - 1],
(i, j) => 1 + [
dp[i - 1][j],
dp[i][j - 1],
dp[i - 1][j - 1],
].into_iter().min().unwrap(),
};
}
}
dp[m][n]
}
fn edit_distance(x: &str, y: &str) -> usize {
// the problem sizes
let m = x.len();
let n = y.len();
// initialize memoization array
let mut dp = vec![0; n + 1];
// top left variable
let mut top_left = 0;
// calculate values
for i in 0..=m {
for j in 0..=n {
// store the top left value
let new_top_left = dp[j];
dp[j] = match (i, j) {
// base case
(0, 0) => 0,
// edge cases
(0, j) => j,
(i, 0) => i,
// recursive cases
(i, j) if x.as_bytes()[i - 1] == y.as_bytes()[j - 1] => top_left,
(i, j) => 1 + [
dp[j],
dp[j - 1],
top_left,
].into_iter().min().unwrap(),
};
// update the top left value
top_left = new_top_left;
}
}
dp[n]
}
Related Topics
LeetCode - Edit Distance: https://leetcode.com/problems/edit-distance/