patternrustMinor
Edit distance implementation
Viewed 0 times
editimplementationdistance
Problem
This problem is taken from the book Introduction to Algorithms, Third Edition By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein:
In order to transform one source string of text x[1..m] to a target
string y[1..n], we can perform various transformation operations.
Our goal is, given x and y, to produce a series of transformations
that change x to y. We use an array ́z—assumed to be large enough to
hold all the characters it will need—to hold the intermediate results.
Initially, z is empty, and at termination, we should have z[j] = y[j]
for j = 1, 2,..., n. We maintain current indices i into x and j into
z, and the operations are allowed to alter z and these indices.
Initially, i = j = 1. We are required to examine every character in x
during the transformation, which means that at the end of the sequence
of transformation operations, we must have i = m + 1.
We may choose from among six transformation operations:
Copy a character from x to z by setting ́z[j] = x[i] and then incrementing both i and j. This operation examines x[i].
Replace a character from x by another character c, by setting z[j] = c, and then incrementing both i and j . This operation examines x[i].
Delete a character from x by incrementing i but leaving j alone. This operation examines x[i].
Insert the character c into z by setting z[j] = c and then incrementing j , but leaving i alone. This operation examines no
characters of x.
Twiddle (i.e., exchange) the next two characters by copying them from x to z but in the opposite order; we do so by setting z[j] = x[i
+ 1] and ́z[j + 1] = x[i] and then setting i = i + 2 and j = j + 2. This operation examines x[i] and x[i + 1].
Kill the remainder of x by setting i = m + 1. This operation examines all characters in x that have not yet been examined. This
operation, if performed, must be the final operation.
a. Given two sequences x[1..m] and y[1..n]
In order to transform one source string of text x[1..m] to a target
string y[1..n], we can perform various transformation operations.
Our goal is, given x and y, to produce a series of transformations
that change x to y. We use an array ́z—assumed to be large enough to
hold all the characters it will need—to hold the intermediate results.
Initially, z is empty, and at termination, we should have z[j] = y[j]
for j = 1, 2,..., n. We maintain current indices i into x and j into
z, and the operations are allowed to alter z and these indices.
Initially, i = j = 1. We are required to examine every character in x
during the transformation, which means that at the end of the sequence
of transformation operations, we must have i = m + 1.
We may choose from among six transformation operations:
Copy a character from x to z by setting ́z[j] = x[i] and then incrementing both i and j. This operation examines x[i].
Replace a character from x by another character c, by setting z[j] = c, and then incrementing both i and j . This operation examines x[i].
Delete a character from x by incrementing i but leaving j alone. This operation examines x[i].
Insert the character c into z by setting z[j] = c and then incrementing j , but leaving i alone. This operation examines no
characters of x.
Twiddle (i.e., exchange) the next two characters by copying them from x to z but in the opposite order; we do so by setting z[j] = x[i
+ 1] and ́z[j + 1] = x[i] and then setting i = i + 2 and j = j + 2. This operation examines x[i] and x[i + 1].
Kill the remainder of x by setting i = m + 1. This operation examines all characters in x that have not yet been examined. This
operation, if performed, must be the final operation.
a. Given two sequences x[1..m] and y[1..n]
Solution
Here's a lightweight review...
Unfortunately, I found the code very dense, which made reading over it fairly difficult. It's possible some more verbose names could help. Here's a cherry-picked sample of code that takes a second to mentally process :
```
extern crate rand;
use std::collections::HashMap;
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
enum EditOp {
Copy,
Replace,
Delete,
Insert,
Twiddle,
Kill,
}
#[derive(Debug, Copy, Clone)]
struct FindBetterName {
idx: usize,
prev: char,
current: char,
}
impl FindBetterName {
fn valid_twiddle(&self, other: &FindBetterName) -> bool {
self.idx > 1 && other.idx > 1 && self.current == other.prev && self.prev == other.current
}
}
impl EditOp {
fn valid(&self, i: FindBetterName, j: FindBetterName) -> bool {
use EditOp::*;
let threshold = |p| i.idx != p || j.idx == p;
match *self {
Copy => threshold(1) && i.current == j.current,
Replace => threshold(1),
Delete => i.idx > 1,
Twiddle => threshold(2) && i.valid_twiddle(&j),
_ => true,
}
}
fn displacement(&self) -> (usize, usize) {
use EditOp::*;
match *self {
Copy | Replace => (1, 1),
Delete => (1, 0),
Insert => (0, 1),
Twiddle => (2, 2),
Kill => (0, 0),
}
}
fn apply(&self, mut from_itr: F, mut to_itr: T, result: &mut String)
where F: Iterator,
T: Iterator
{
use EditOp::*;
match *self {
Copy => {
result.push(from_itr.next().unwrap());
to_itr.next();
}
Replace => {
result.push(to_itr.next().unwrap());
from_itr.next();
}
Delete => {
from_itr.next();
}
Insert => {
result.push(to_itr.next().unwrap());
}
Twiddle => {
let second = from_itr.next().unwrap();
let first = from_itr.next().unwrap();
result.push(first);
result.push(second);
to_itr.next();
to_itr.next();
}
Kill => {}
}
}
}
type CostMap = HashMap;
fn create_tables(from_len: usize,
to_len: usize,
costs: &CostMap)
-> (Vec>, Vec>) {
let mut op_table = vec![ vec![EditOp::Kill; to_len]; from_len ];
let mut cost_table = vec![ vec![0usize; to_len]; from_len ];
let delete_cost = costs.get(&EditOp::Delete).cloned().unwrap_or(0);
for (i, c) in (1..from_len).map(|i| (i, i * delete_cost)) {
cost_table[i][0] = c;
op_table[i][0] = EditOp::Delete;
}
(op_table, cost_table)
}
fn edit_distance(from: &str, to: &str, costs: &CostMap) -> (usize, Operations) {
if costs.is_empty() {
panic!("map of costs is empty");
}
let from_len = from.chars().count();
let to_len = to.chars().count();
let (mut op_table, mut cost_table) = create_tables(from_len + 1, to_len + 1, costs);
let (mut lasti, mut lastj) = (' ', ' ');
for (i, ichar) in from.chars().enumerate() {
for (j, jchar) in to.chars().enumerate() {
let (i, j) = (i + 1, j + 1);
let ii = FindBetterName {
idx: i,
- You can combine imports with
{}:use std::collections::{HashMap, VecDeque};
- Consider importing enum variants in methods that exhaustively match on a lot of them. This helps avoid duplication and rightward pushing of code.
- Accept a generic type that implements an iterator, instead of the specific
std::str::Chars.
- Introduce more and unique types, especially when you have repeated parallel names (
i,iprev,icurrent). This also allows you to move functionality into methods.
- Introduce type aliases. This is a tiny step towards official types, but gives an opportunity to introduce names and shorten them.
- Instead of
VecDeque, use a normalVec, then you can useIterator::collectand reverse it.
- Use
a == &binstead ofa.eq(&b).
- I always recommend using
expectinstead ofunwrap. When the panic happens, you will be much happier having some context of where the failure is.
- Maybe introduce a dedicated struct for the cost mapping. Then you can implement
Default, and there's no chance of missing a key.
- Use
&[T]instead of&Vec. Faster and more usable.
- [Clippy] Combine identical
matcharms withFoo | Bar.
- [Clippy] Use
map_orinstead ofmap().unwrap_or.
- [Clippy] Can use
clonedinstead of|x| *x.
- [Clippy] Calls to
editandedit_distancehave extra references.
- Check out rustfmt, which, let's say disagrees, with some of your style choices.
Unfortunately, I found the code very dense, which made reading over it fairly difficult. It's possible some more verbose names could help. Here's a cherry-picked sample of code that takes a second to mentally process :
.map(|(x, c)| {
let (di, dj) = x.displacement();
(x, cost_table[i - di][j - dj] + c)
})```
extern crate rand;
use std::collections::HashMap;
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
enum EditOp {
Copy,
Replace,
Delete,
Insert,
Twiddle,
Kill,
}
#[derive(Debug, Copy, Clone)]
struct FindBetterName {
idx: usize,
prev: char,
current: char,
}
impl FindBetterName {
fn valid_twiddle(&self, other: &FindBetterName) -> bool {
self.idx > 1 && other.idx > 1 && self.current == other.prev && self.prev == other.current
}
}
impl EditOp {
fn valid(&self, i: FindBetterName, j: FindBetterName) -> bool {
use EditOp::*;
let threshold = |p| i.idx != p || j.idx == p;
match *self {
Copy => threshold(1) && i.current == j.current,
Replace => threshold(1),
Delete => i.idx > 1,
Twiddle => threshold(2) && i.valid_twiddle(&j),
_ => true,
}
}
fn displacement(&self) -> (usize, usize) {
use EditOp::*;
match *self {
Copy | Replace => (1, 1),
Delete => (1, 0),
Insert => (0, 1),
Twiddle => (2, 2),
Kill => (0, 0),
}
}
fn apply(&self, mut from_itr: F, mut to_itr: T, result: &mut String)
where F: Iterator,
T: Iterator
{
use EditOp::*;
match *self {
Copy => {
result.push(from_itr.next().unwrap());
to_itr.next();
}
Replace => {
result.push(to_itr.next().unwrap());
from_itr.next();
}
Delete => {
from_itr.next();
}
Insert => {
result.push(to_itr.next().unwrap());
}
Twiddle => {
let second = from_itr.next().unwrap();
let first = from_itr.next().unwrap();
result.push(first);
result.push(second);
to_itr.next();
to_itr.next();
}
Kill => {}
}
}
}
type CostMap = HashMap;
fn create_tables(from_len: usize,
to_len: usize,
costs: &CostMap)
-> (Vec>, Vec>) {
let mut op_table = vec![ vec![EditOp::Kill; to_len]; from_len ];
let mut cost_table = vec![ vec![0usize; to_len]; from_len ];
let delete_cost = costs.get(&EditOp::Delete).cloned().unwrap_or(0);
for (i, c) in (1..from_len).map(|i| (i, i * delete_cost)) {
cost_table[i][0] = c;
op_table[i][0] = EditOp::Delete;
}
(op_table, cost_table)
}
fn edit_distance(from: &str, to: &str, costs: &CostMap) -> (usize, Operations) {
if costs.is_empty() {
panic!("map of costs is empty");
}
let from_len = from.chars().count();
let to_len = to.chars().count();
let (mut op_table, mut cost_table) = create_tables(from_len + 1, to_len + 1, costs);
let (mut lasti, mut lastj) = (' ', ' ');
for (i, ichar) in from.chars().enumerate() {
for (j, jchar) in to.chars().enumerate() {
let (i, j) = (i + 1, j + 1);
let ii = FindBetterName {
idx: i,
Code Snippets
.map(|(x, c)| {
let (di, dj) = x.displacement();
(x, cost_table[i - di][j - dj] + c)
})extern crate rand;
use std::collections::HashMap;
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
enum EditOp {
Copy,
Replace,
Delete,
Insert,
Twiddle,
Kill,
}
#[derive(Debug, Copy, Clone)]
struct FindBetterName {
idx: usize,
prev: char,
current: char,
}
impl FindBetterName {
fn valid_twiddle(&self, other: &FindBetterName) -> bool {
self.idx > 1 && other.idx > 1 && self.current == other.prev && self.prev == other.current
}
}
impl EditOp {
fn valid(&self, i: FindBetterName, j: FindBetterName) -> bool {
use EditOp::*;
let threshold = |p| i.idx != p || j.idx == p;
match *self {
Copy => threshold(1) && i.current == j.current,
Replace => threshold(1),
Delete => i.idx > 1,
Twiddle => threshold(2) && i.valid_twiddle(&j),
_ => true,
}
}
fn displacement(&self) -> (usize, usize) {
use EditOp::*;
match *self {
Copy | Replace => (1, 1),
Delete => (1, 0),
Insert => (0, 1),
Twiddle => (2, 2),
Kill => (0, 0),
}
}
fn apply<F, T>(&self, mut from_itr: F, mut to_itr: T, result: &mut String)
where F: Iterator<Item = char>,
T: Iterator<Item = char>
{
use EditOp::*;
match *self {
Copy => {
result.push(from_itr.next().unwrap());
to_itr.next();
}
Replace => {
result.push(to_itr.next().unwrap());
from_itr.next();
}
Delete => {
from_itr.next();
}
Insert => {
result.push(to_itr.next().unwrap());
}
Twiddle => {
let second = from_itr.next().unwrap();
let first = from_itr.next().unwrap();
result.push(first);
result.push(second);
to_itr.next();
to_itr.next();
}
Kill => {}
}
}
}
type CostMap = HashMap<EditOp, usize>;
fn create_tables(from_len: usize,
to_len: usize,
costs: &CostMap)
-> (Vec<Vec<EditOp>>, Vec<Vec<usize>>) {
let mut op_table = vec![ vec![EditOp::Kill; to_len]; from_len ];
let mut cost_table = vec![ vec![0usize; to_len]; from_len ];
let delete_cost = costs.get(&EditOp::Delete).cloned().unwrap_or(0);
for (i, c) in (1..from_len).map(|i| (i, i * delete_cost)) {
cost_table[i][0] = c;
op_table[i][0] = EditOp::Delete;
}
(op_table, cost_table)
}
fn edit_distance(from: &str, to: &str, costs: &CostMap) -> (usize, Operations) {
if costs.is_empty() {
panic!("map of costs is empty");
}
let from_len = from.chars().count();
let to_len = to.chars().count();
let (mut op_table, mut cost_table) = create_tables(from_len + 1, to_len Context
StackExchange Code Review Q#142275, answer score: 3
Revisions (0)
No revisions yet.