HiveBrain v1.2.0
Get Started
← Back to all entries
patternrustMinor

Peter Norvigs spellcheck in Rust

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rustnorvigsspellcheckpeter

Problem

For a Rust project I'm working on I needed a (simple) spellcheck algorithm and I set out to re-implement Peter Norvigs spellcheck algorithm in Rust.

```
extern crate regex;

use regex::Regex;

use std::collections::HashMap;
use std::collections::HashSet;
use std::iter::FromIterator;
use std::path::Path;
use std::io::prelude::*;
use std::fs::File;

static ALPHABET : &'static str = "abcdefghijklmnopqrstuvwxyz";

fn deletes(s: &str) -> Vec {
(0..s.len())
.map(|i| {
let mut delete = s.to_owned();
delete.remove(i);

delete
})
.collect()
}

fn inserts(s: &str) -> Vec {
(0..s.len() + 1)
.flat_map(|i| {
ALPHABET.chars().map(|chr| {
let mut insert = s.to_owned();
insert.insert(i, chr);

insert
}).collect::>()
})
.collect()
}

fn replaces(s: &str) -> Vec {
(0..s.len())
.flat_map(|i| {
ALPHABET.chars().map(|chr| {
let mut replace = String::with_capacity(s.len());
replace.push_str(&s[0..i]);
replace.push(chr);
replace.push_str(&s[i + 1..]);

replace
}).collect::>()
})
.collect()
}

fn transposes(s: &str) -> Vec {
let bytes = s.as_bytes();

(1..bytes.len())
.map(|i| {
let mut transpose = bytes.to_owned();
transpose.swap(i - 1, i);

String::from_utf8(transpose).expect("Invalid UTF-8")
})
.collect()
}

fn read_dictionary_words(dictionary_path: &str) -> String {
let mut buffer = String::new();

File::open(&Path::new(dictionary_path))
.unwrap()
.read_to_string(&mut buffer)
.unwrap();

buffer
}

fn word_count(all_words: &str) -> HashMap {
let re = Regex::new(r"\w+").unwrap();
let lowercase = all_words.to_lowercase();

let mut map = HashMap::new();

for word in re.captures_

Solution

A few comments, since no one else has posted here:

Set up some proper error handling so you don't have so many unwraps and expects. I think the error_chain crate is the recommended/most popular way to do error handling, but is probably overkill for your case. However, error_chain also provides the quick_error macro, which allows you to easily define your own errors and wrap external errors. For very, very basic handling you can just do:

pub mod err {
    quick_error! {
        #[derive(Debug)]
        pub enum Error {
            Other(err: Box) {
                from(e: &'static str) -> (e.into())
                from(e: ::std::io::Error) -> (e.into())                
                description(err.description())
                display("{}", err)
            }    
        }
    }
    pub type Result = ::std::result::Result;
}


And then wrap your functions in Result, and you can use try/? everywhere.

Not really a big deal for your case since you only call it once, but it is good practice to wrap Regex::new in lazy_static to avoid compiling it multiple times, e.g.:

lazy_static! {
    static ref RE: Regex = Regex::new(r"\w+").unwrap();
}


You can work around the String allocation with the Entry API like (not sure if there is a more graceful way to do this...):

for word in re.captures_iter(&lowercase) {
    let word0 = word[0];
    if let Some(e) = map.get_mut(word0) {
        *e += 1;
        continue;
    }
    map.insert(word0.into(), 1);
}


I would suggest returning an Option or Result from known where the empty case returns a None like type, so that you don't have to check candidates.len() > 0 multiple times in create_candidates. You could chain the different candidates then return the default if none apply.

ALSO! If your performance seems bad, make sure that you are doing cargo run --release or cargo build --release to run your program.

Hope this helps—

Code Snippets

pub mod err {
    quick_error! {
        #[derive(Debug)]
        pub enum Error {
            Other(err: Box<::std::error::Error + Send + Sync>) {
                from(e: &'static str) -> (e.into())
                from(e: ::std::io::Error) -> (e.into())                
                description(err.description())
                display("{}", err)
            }    
        }
    }
    pub type Result<T> = ::std::result::Result<T, Error>;
}
lazy_static! {
    static ref RE: Regex = Regex::new(r"\w+").unwrap();
}
for word in re.captures_iter(&lowercase) {
    let word0 = word[0];
    if let Some(e) = map.get_mut(word0) {
        *e += 1;
        continue;
    }
    map.insert(word0.into(), 1);
}

Context

StackExchange Code Review Q#155399, answer score: 2

Revisions (0)

No revisions yet.