patternphpMinor
PHP spell checker with suggestions for misspelled words
Viewed 0 times
spellmisspelledwithphpwordsforsuggestionschecker
Problem
I built a simple PHP spellchecker and suggestions app that uses PHP's similar_text() and levenshtein() functions to compare words from a dictionary that is loaded into an array.
array.
words.
dictionary.
or more similar to a word in the dictionary array, then I copy that
word from the dictionary array into an array of suggestions.
comparison, then I use levenshtein() to do a more liberal comparison
and add suggestions to the suggestions array.
suggestion.
I noticed that this is running slowly. Slow enough to notice. And I was wondering how I could improve the speed and efficiency of this spell checker.
Any and all changes, improvements, suggestions, and code are welcome and appreciated.
Here is the code (For syntax highlighted code, please visit here):
```
=90 && $percentageSimilarity0){
if(!in_array($suggestions)){
array_push($suggestions, $word);
}
}
}
}
echo "Looks like you spelled that wrong. Here are some suggestions: ";
foreach($suggestions as $suggestion){
echo "".$suggestion."";
}
}
}
if(isset($_GET['check'])){
$input = trim($_GET['check']);
$sentence='';
if(stripos($input, ' ')!==false){
$sentence = explode(' ', $input);
foreach($sentence as $item){
checkSpelling($item, $words);
}
}
else{
checkSpelling($input, $words);
}
}
?>
- How it works is first I load the contents of the dictionary into an
array.
- I split the user's input into words and spell check each of the
words.
- I spell check by checking if the word is in the array that is the
dictionary.
- If it is, then I echo a congratulations message and move on.
- If not, I iterate through the dictionary-array comparing each word, in the dictionary-array, with the assumed misspelling.
- If the inputted word, in lower-case and without punctuation, is 90%
or more similar to a word in the dictionary array, then I copy that
word from the dictionary array into an array of suggestions.
- If no suggestions were found using the 90% or higher similarity
comparison, then I use levenshtein() to do a more liberal comparison
and add suggestions to the suggestions array.
- Then I iterate through the suggestions array and echo each
suggestion.
I noticed that this is running slowly. Slow enough to notice. And I was wondering how I could improve the speed and efficiency of this spell checker.
Any and all changes, improvements, suggestions, and code are welcome and appreciated.
Here is the code (For syntax highlighted code, please visit here):
```
=90 && $percentageSimilarity0){
if(!in_array($suggestions)){
array_push($suggestions, $word);
}
}
}
}
echo "Looks like you spelled that wrong. Here are some suggestions: ";
foreach($suggestions as $suggestion){
echo "".$suggestion."";
}
}
}
if(isset($_GET['check'])){
$input = trim($_GET['check']);
$sentence='';
if(stripos($input, ' ')!==false){
$sentence = explode(' ', $input);
foreach($sentence as $item){
checkSpelling($item, $words);
}
}
else{
checkSpelling($input, $words);
}
}
?>
Solution
Here are a couple of tweaks that could help performance:
-
Rather than storing the dictionary in-memory, offload that to a database (potentially even caching commonly misspelled words as an optimization)
-
Ignore words under a certain length
(for example, MySQL's fulltext searching ignores words with fewer than 4 characters by default)
The thing that concerns me most with your algorithm is how much time it would take to compare every single word in the dictionary. This problem compounds with more words in the search query.
There has to be a way to quickly filter the dictionary to a smaller list of higher probability similarities (i.e. by word length, first letter, etc. ?)
-
Rather than storing the dictionary in-memory, offload that to a database (potentially even caching commonly misspelled words as an optimization)
-
Ignore words under a certain length
(for example, MySQL's fulltext searching ignores words with fewer than 4 characters by default)
The thing that concerns me most with your algorithm is how much time it would take to compare every single word in the dictionary. This problem compounds with more words in the search query.
There has to be a way to quickly filter the dictionary to a smaller list of higher probability similarities (i.e. by word length, first letter, etc. ?)
Context
StackExchange Code Review Q#27173, answer score: 3
Revisions (0)
No revisions yet.