patternrustMinor
Peter Norvigs spellcheck in Rust
Viewed 0 times
rustnorvigsspellcheckpeter
Problem
For a Rust project I'm working on I needed a (simple) spellcheck algorithm and I set out to re-implement Peter Norvigs spellcheck algorithm in Rust.
```
extern crate regex;
use regex::Regex;
use std::collections::HashMap;
use std::collections::HashSet;
use std::iter::FromIterator;
use std::path::Path;
use std::io::prelude::*;
use std::fs::File;
static ALPHABET : &'static str = "abcdefghijklmnopqrstuvwxyz";
fn deletes(s: &str) -> Vec {
(0..s.len())
.map(|i| {
let mut delete = s.to_owned();
delete.remove(i);
delete
})
.collect()
}
fn inserts(s: &str) -> Vec {
(0..s.len() + 1)
.flat_map(|i| {
ALPHABET.chars().map(|chr| {
let mut insert = s.to_owned();
insert.insert(i, chr);
insert
}).collect::>()
})
.collect()
}
fn replaces(s: &str) -> Vec {
(0..s.len())
.flat_map(|i| {
ALPHABET.chars().map(|chr| {
let mut replace = String::with_capacity(s.len());
replace.push_str(&s[0..i]);
replace.push(chr);
replace.push_str(&s[i + 1..]);
replace
}).collect::>()
})
.collect()
}
fn transposes(s: &str) -> Vec {
let bytes = s.as_bytes();
(1..bytes.len())
.map(|i| {
let mut transpose = bytes.to_owned();
transpose.swap(i - 1, i);
String::from_utf8(transpose).expect("Invalid UTF-8")
})
.collect()
}
fn read_dictionary_words(dictionary_path: &str) -> String {
let mut buffer = String::new();
File::open(&Path::new(dictionary_path))
.unwrap()
.read_to_string(&mut buffer)
.unwrap();
buffer
}
fn word_count(all_words: &str) -> HashMap {
let re = Regex::new(r"\w+").unwrap();
let lowercase = all_words.to_lowercase();
let mut map = HashMap::new();
for word in re.captures_
```
extern crate regex;
use regex::Regex;
use std::collections::HashMap;
use std::collections::HashSet;
use std::iter::FromIterator;
use std::path::Path;
use std::io::prelude::*;
use std::fs::File;
static ALPHABET : &'static str = "abcdefghijklmnopqrstuvwxyz";
fn deletes(s: &str) -> Vec {
(0..s.len())
.map(|i| {
let mut delete = s.to_owned();
delete.remove(i);
delete
})
.collect()
}
fn inserts(s: &str) -> Vec {
(0..s.len() + 1)
.flat_map(|i| {
ALPHABET.chars().map(|chr| {
let mut insert = s.to_owned();
insert.insert(i, chr);
insert
}).collect::>()
})
.collect()
}
fn replaces(s: &str) -> Vec {
(0..s.len())
.flat_map(|i| {
ALPHABET.chars().map(|chr| {
let mut replace = String::with_capacity(s.len());
replace.push_str(&s[0..i]);
replace.push(chr);
replace.push_str(&s[i + 1..]);
replace
}).collect::>()
})
.collect()
}
fn transposes(s: &str) -> Vec {
let bytes = s.as_bytes();
(1..bytes.len())
.map(|i| {
let mut transpose = bytes.to_owned();
transpose.swap(i - 1, i);
String::from_utf8(transpose).expect("Invalid UTF-8")
})
.collect()
}
fn read_dictionary_words(dictionary_path: &str) -> String {
let mut buffer = String::new();
File::open(&Path::new(dictionary_path))
.unwrap()
.read_to_string(&mut buffer)
.unwrap();
buffer
}
fn word_count(all_words: &str) -> HashMap {
let re = Regex::new(r"\w+").unwrap();
let lowercase = all_words.to_lowercase();
let mut map = HashMap::new();
for word in re.captures_
Solution
A few comments, since no one else has posted here:
Set up some proper error handling so you don't have so many
And then wrap your functions in
Not really a big deal for your case since you only call it once, but it is good practice to wrap
You can work around the String allocation with the Entry API like (not sure if there is a more graceful way to do this...):
I would suggest returning an
ALSO! If your performance seems bad, make sure that you are doing
Hope this helps—
Set up some proper error handling so you don't have so many
unwraps and expects. I think the error_chain crate is the recommended/most popular way to do error handling, but is probably overkill for your case. However, error_chain also provides the quick_error macro, which allows you to easily define your own errors and wrap external errors. For very, very basic handling you can just do:pub mod err {
quick_error! {
#[derive(Debug)]
pub enum Error {
Other(err: Box) {
from(e: &'static str) -> (e.into())
from(e: ::std::io::Error) -> (e.into())
description(err.description())
display("{}", err)
}
}
}
pub type Result = ::std::result::Result;
}And then wrap your functions in
Result, and you can use try/? everywhere.Not really a big deal for your case since you only call it once, but it is good practice to wrap
Regex::new in lazy_static to avoid compiling it multiple times, e.g.:lazy_static! {
static ref RE: Regex = Regex::new(r"\w+").unwrap();
}You can work around the String allocation with the Entry API like (not sure if there is a more graceful way to do this...):
for word in re.captures_iter(&lowercase) {
let word0 = word[0];
if let Some(e) = map.get_mut(word0) {
*e += 1;
continue;
}
map.insert(word0.into(), 1);
}I would suggest returning an
Option or Result from known where the empty case returns a None like type, so that you don't have to check candidates.len() > 0 multiple times in create_candidates. You could chain the different candidates then return the default if none apply.ALSO! If your performance seems bad, make sure that you are doing
cargo run --release or cargo build --release to run your program.Hope this helps—
Code Snippets
pub mod err {
quick_error! {
#[derive(Debug)]
pub enum Error {
Other(err: Box<::std::error::Error + Send + Sync>) {
from(e: &'static str) -> (e.into())
from(e: ::std::io::Error) -> (e.into())
description(err.description())
display("{}", err)
}
}
}
pub type Result<T> = ::std::result::Result<T, Error>;
}lazy_static! {
static ref RE: Regex = Regex::new(r"\w+").unwrap();
}for word in re.captures_iter(&lowercase) {
let word0 = word[0];
if let Some(e) = map.get_mut(word0) {
*e += 1;
continue;
}
map.insert(word0.into(), 1);
}Context
StackExchange Code Review Q#155399, answer score: 2
Revisions (0)
No revisions yet.