HiveBrain v1.2.0
Get Started
← Back to all entries
patternrustMinor

Edit distance implementation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
editimplementationdistance

Problem

This problem is taken from the book Introduction to Algorithms, Third Edition By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein:


In order to transform one source string of text x[1..m] to a target
string y[1..n], we can perform various transformation operations.
Our goal is, given x and y, to produce a series of transformations
that change x to y. We use an array ́z—assumed to be large enough to
hold all the characters it will need—to hold the intermediate results.
Initially, z is empty, and at termination, we should have z[j] = y[j]
for j = 1, 2,..., n. We maintain current indices i into x and j into
z, and the operations are allowed to alter z and these indices.
Initially, i = j = 1. We are required to examine every character in x
during the transformation, which means that at the end of the sequence
of transformation operations, we must have i = m + 1.


We may choose from among six transformation operations:


Copy a character from x to z by setting ́z[j] = x[i] and then incrementing both i and j. This operation examines x[i].


Replace a character from x by another character c, by setting z[j] = c, and then incrementing both i and j . This operation examines x[i].


Delete a character from x by incrementing i but leaving j alone. This operation examines x[i].


Insert the character c into z by setting z[j] = c and then incrementing j , but leaving i alone. This operation examines no
characters of x.


Twiddle (i.e., exchange) the next two characters by copying them from x to z but in the opposite order; we do so by setting z[j] = x[i
+ 1] and ́z[j + 1] = x[i] and then setting i = i + 2 and j = j + 2. This operation examines x[i] and x[i + 1].


Kill the remainder of x by setting i = m + 1. This operation examines all characters in x that have not yet been examined. This
operation, if performed, must be the final operation.


a. Given two sequences x[1..m] and y[1..n]

Solution

Here's a lightweight review...

  • You can combine imports with {}: use std::collections::{HashMap, VecDeque};



  • Consider importing enum variants in methods that exhaustively match on a lot of them. This helps avoid duplication and rightward pushing of code.



  • Accept a generic type that implements an iterator, instead of the specific std::str::Chars.



  • Introduce more and unique types, especially when you have repeated parallel names (i, iprev, icurrent). This also allows you to move functionality into methods.



  • Introduce type aliases. This is a tiny step towards official types, but gives an opportunity to introduce names and shorten them.



  • Instead of VecDeque, use a normal Vec, then you can use Iterator::collect and reverse it.



  • Use a == &b instead of a.eq(&b).



  • I always recommend using expect instead of unwrap. When the panic happens, you will be much happier having some context of where the failure is.



  • Maybe introduce a dedicated struct for the cost mapping. Then you can implement Default, and there's no chance of missing a key.



  • Use &[T] instead of &Vec. Faster and more usable.



  • [Clippy] Combine identical match arms with Foo | Bar.



  • [Clippy] Use map_or instead of map().unwrap_or.



  • [Clippy] Can use cloned instead of |x| *x.



  • [Clippy] Calls to edit and edit_distance have extra references.



  • Check out rustfmt, which, let's say disagrees, with some of your style choices.



Unfortunately, I found the code very dense, which made reading over it fairly difficult. It's possible some more verbose names could help. Here's a cherry-picked sample of code that takes a second to mentally process :

.map(|(x, c)| {
    let (di, dj) = x.displacement();
    (x, cost_table[i - di][j - dj] + c)
})


```
extern crate rand;

use std::collections::HashMap;

#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
enum EditOp {
Copy,
Replace,
Delete,
Insert,
Twiddle,
Kill,
}

#[derive(Debug, Copy, Clone)]
struct FindBetterName {
idx: usize,
prev: char,
current: char,
}

impl FindBetterName {
fn valid_twiddle(&self, other: &FindBetterName) -> bool {
self.idx > 1 && other.idx > 1 && self.current == other.prev && self.prev == other.current
}
}

impl EditOp {
fn valid(&self, i: FindBetterName, j: FindBetterName) -> bool {
use EditOp::*;

let threshold = |p| i.idx != p || j.idx == p;

match *self {
Copy => threshold(1) && i.current == j.current,
Replace => threshold(1),
Delete => i.idx > 1,
Twiddle => threshold(2) && i.valid_twiddle(&j),
_ => true,
}
}

fn displacement(&self) -> (usize, usize) {
use EditOp::*;

match *self {
Copy | Replace => (1, 1),
Delete => (1, 0),
Insert => (0, 1),
Twiddle => (2, 2),
Kill => (0, 0),
}
}

fn apply(&self, mut from_itr: F, mut to_itr: T, result: &mut String)
where F: Iterator,
T: Iterator
{
use EditOp::*;

match *self {
Copy => {
result.push(from_itr.next().unwrap());
to_itr.next();
}
Replace => {
result.push(to_itr.next().unwrap());
from_itr.next();
}
Delete => {
from_itr.next();
}
Insert => {
result.push(to_itr.next().unwrap());
}
Twiddle => {
let second = from_itr.next().unwrap();
let first = from_itr.next().unwrap();
result.push(first);
result.push(second);
to_itr.next();
to_itr.next();
}
Kill => {}
}
}
}

type CostMap = HashMap;

fn create_tables(from_len: usize,
to_len: usize,
costs: &CostMap)
-> (Vec>, Vec>) {
let mut op_table = vec![ vec![EditOp::Kill; to_len]; from_len ];
let mut cost_table = vec![ vec![0usize; to_len]; from_len ];
let delete_cost = costs.get(&EditOp::Delete).cloned().unwrap_or(0);

for (i, c) in (1..from_len).map(|i| (i, i * delete_cost)) {
cost_table[i][0] = c;
op_table[i][0] = EditOp::Delete;
}

(op_table, cost_table)
}

fn edit_distance(from: &str, to: &str, costs: &CostMap) -> (usize, Operations) {
if costs.is_empty() {
panic!("map of costs is empty");
}

let from_len = from.chars().count();
let to_len = to.chars().count();

let (mut op_table, mut cost_table) = create_tables(from_len + 1, to_len + 1, costs);
let (mut lasti, mut lastj) = (' ', ' ');

for (i, ichar) in from.chars().enumerate() {
for (j, jchar) in to.chars().enumerate() {
let (i, j) = (i + 1, j + 1);
let ii = FindBetterName {
idx: i,

Code Snippets

.map(|(x, c)| {
    let (di, dj) = x.displacement();
    (x, cost_table[i - di][j - dj] + c)
})
extern crate rand;

use std::collections::HashMap;

#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
enum EditOp {
    Copy,
    Replace,
    Delete,
    Insert,
    Twiddle,
    Kill,
}

#[derive(Debug, Copy, Clone)]
struct FindBetterName {
    idx: usize,
    prev: char,
    current: char,
}

impl FindBetterName {
    fn valid_twiddle(&self, other: &FindBetterName) -> bool {
        self.idx > 1 && other.idx > 1 && self.current == other.prev && self.prev == other.current
    }
}

impl EditOp {
    fn valid(&self, i: FindBetterName, j: FindBetterName) -> bool {
        use EditOp::*;

        let threshold = |p| i.idx != p || j.idx == p;

        match *self {
            Copy => threshold(1) && i.current == j.current,
            Replace => threshold(1),
            Delete => i.idx > 1,
            Twiddle => threshold(2) && i.valid_twiddle(&j),
            _ => true,
        }
    }

    fn displacement(&self) -> (usize, usize) {
        use EditOp::*;

        match *self {
            Copy | Replace => (1, 1),
            Delete => (1, 0),
            Insert => (0, 1),
            Twiddle => (2, 2),
            Kill => (0, 0),
        }
    }

    fn apply<F, T>(&self, mut from_itr: F, mut to_itr: T, result: &mut String)
        where F: Iterator<Item = char>,
              T: Iterator<Item = char>
    {
        use EditOp::*;

        match *self {
            Copy => {
                result.push(from_itr.next().unwrap());
                to_itr.next();
            }
            Replace => {
                result.push(to_itr.next().unwrap());
                from_itr.next();
            }
            Delete => {
                from_itr.next();
            }
            Insert => {
                result.push(to_itr.next().unwrap());
            }
            Twiddle => {
                let second = from_itr.next().unwrap();
                let first = from_itr.next().unwrap();
                result.push(first);
                result.push(second);
                to_itr.next();
                to_itr.next();
            }
            Kill => {}
        }
    }
}

type CostMap = HashMap<EditOp, usize>;

fn create_tables(from_len: usize,
                 to_len: usize,
                 costs: &CostMap)
                 -> (Vec<Vec<EditOp>>, Vec<Vec<usize>>) {
    let mut op_table = vec![ vec![EditOp::Kill; to_len]; from_len ];
    let mut cost_table = vec![ vec![0usize; to_len]; from_len ];
    let delete_cost = costs.get(&EditOp::Delete).cloned().unwrap_or(0);

    for (i, c) in (1..from_len).map(|i| (i, i * delete_cost)) {
        cost_table[i][0] = c;
        op_table[i][0] = EditOp::Delete;
    }

    (op_table, cost_table)
}

fn edit_distance(from: &str, to: &str, costs: &CostMap) -> (usize, Operations) {
    if costs.is_empty() {
        panic!("map of costs is empty");
    }

    let from_len = from.chars().count();
    let to_len = to.chars().count();

    let (mut op_table, mut cost_table) = create_tables(from_len + 1, to_len 

Context

StackExchange Code Review Q#142275, answer score: 3

Revisions (0)

No revisions yet.