patterncModerate
checkmate - C spelling corrector 2.0
Viewed 0 times
spellingcorrectorcheckmate
Problem
Since I posted my first version of the spelling corrector here, I've been working on improving it a little in my free time. I've also gone ahead and put the project up on Github so that others can now make contributions to the project if they wish to do so.
checkmate.c:
```
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define TABLE_SIZE 5013
#define ALPHABET_SIZE (sizeof(alphabet) - 1)
char *dictionary = "5k.txt";
const char alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
" '";
void *checkedMalloc(size_t len)
{
void *ret = malloc(len);
if (!ret)
{
fputs("Out of memory!", stderr);
exit(0);
}
return ret;
}
int arrayExist(char **array, int rows, char *word)
{
for (int i = 0; i 0)
{
memcpy(&dst[*dstLen], &src[srcBegin], length);
*dstLen += length;
}
dst[*dstLen] = 0;
}
int deletion(char *word, char **array, int start)
{
int i = 0;
size_t length = strlen(word);
for (; i = resMax)
{
// initially allocate 50 entries, after double the size
if (resMax == 0) resMax = 50;
else resMax *= 2;
}
res = realloc(res, sizeof(char) resMax);
res[resSize++] = e1[j];
}
}
}
*e2_rows = resSize;
return res;
}
char *bestMatch(char **array, int rows)
{
char *maxWord = NULL;
int maxSize = TABLE_SIZE;
ENTRY *e;
for (int i = 0; i data data;
maxWord = e->key;
}
}
return maxWord;
}
char correct(char word)
{
char **e1 = NULL;
char **e2 = NULL;
char *e1_word = NULL;
char *e2_word = NULL;
char *resWord = word;
int e1_rows = 0;
char e2_rows = 0;
if (find(word)) return word;
e1_rows = (unsigned) totalEdits(word);
if (e1_rows)
{
e1 =
checkmate.c:
```
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define TABLE_SIZE 5013
#define ALPHABET_SIZE (sizeof(alphabet) - 1)
char *dictionary = "5k.txt";
const char alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
" '";
void *checkedMalloc(size_t len)
{
void *ret = malloc(len);
if (!ret)
{
fputs("Out of memory!", stderr);
exit(0);
}
return ret;
}
int arrayExist(char **array, int rows, char *word)
{
for (int i = 0; i 0)
{
memcpy(&dst[*dstLen], &src[srcBegin], length);
*dstLen += length;
}
dst[*dstLen] = 0;
}
int deletion(char *word, char **array, int start)
{
int i = 0;
size_t length = strlen(word);
for (; i = resMax)
{
// initially allocate 50 entries, after double the size
if (resMax == 0) resMax = 50;
else resMax *= 2;
}
res = realloc(res, sizeof(char) resMax);
res[resSize++] = e1[j];
}
}
}
*e2_rows = resSize;
return res;
}
char *bestMatch(char **array, int rows)
{
char *maxWord = NULL;
int maxSize = TABLE_SIZE;
ENTRY *e;
for (int i = 0; i data data;
maxWord = e->key;
}
}
return maxWord;
}
char correct(char word)
{
char **e1 = NULL;
char **e2 = NULL;
char *e1_word = NULL;
char *e2_word = NULL;
char *resWord = word;
int e1_rows = 0;
char e2_rows = 0;
if (find(word)) return word;
e1_rows = (unsigned) totalEdits(word);
if (e1_rows)
{
e1 =
Solution
Program exit code
The
Exit code 0 usually means success, so it would be better to use something else.
The
The
It's confusing to have multiple exit points scattered around in the program.
It's also hard to keep track of the exit codes that are magic numbers.
As a first step, it would be good to put the exit codes in well-named constants.
As a second step, it would be good to centralize the exit points if possible.
(For out-of-memory it's probably not practical,
but for the others it might be, especially considering that
Error handling and function return values
The return value of this function is checked with
so the returned
It's also unfortunate that the
Either use the global variable and drop the parameter,
or use the parameter instead of the global variable.
Naming
When I see
But in this program it's a
storing the name of the dictionary file.
So I'd call it
Usability
Instead of hardcoding
it would be easier to play with and test the program if it took filenames as command line arguments.
Writing style
I was a bit surprised by this code at first:
At first I didn't really get what are those expressions lined up vertically.
It became clearer as I read the right-end of the lines.
This way it would have been more obvious right off the bat:
The
checkedMalloc function does exit(0) in case you run out of memory.Exit code 0 usually means success, so it would be better to use something else.
The
main function returns -1 if a problem happens while reading the dictionary. The
readDictionary does exit(-1) if it cannot add a word to the hash table.It's confusing to have multiple exit points scattered around in the program.
It's also hard to keep track of the exit codes that are magic numbers.
As a first step, it would be good to put the exit codes in well-named constants.
As a second step, it would be good to centralize the exit points if possible.
(For out-of-memory it's probably not practical,
but for the others it might be, especially considering that
readDictionary sometimes returns on errors instead of exiting.)Error handling and function return values
readDictionary behaves very confusingly:- Return 0 if opening dictionary file failed
- Return 0 if getting stats on dictionary file failed
- Return -1 if
mmapfailed
- Exit program with -1 if adding entry to hash table failed
- Return 1 on success
The return value of this function is checked with
!readDictionary(...),so the returned
-1 will be considered success.It's also unfortunate that the
fileName parameter of the function is exactly the same as the dictionary global variable.Either use the global variable and drop the parameter,
or use the parameter instead of the global variable.
Naming
When I see
dictionary, I'm thinking some kind of hash table.But in this program it's a
char* variable,storing the name of the dictionary file.
So I'd call it
dictionary_path.Usability
Instead of hardcoding
correctArray and checkArray inside main,it would be easier to play with and test the program if it took filenames as command line arguments.
Writing style
I was a bit surprised by this code at first:
return (length) + // deletion
(length - 1) + // transposition
(length * ALPHABET_SIZE) + // alteration
(length + 1) * ALPHABET_SIZE; // insertionAt first I didn't really get what are those expressions lined up vertically.
It became clearer as I read the right-end of the lines.
This way it would have been more obvious right off the bat:
return (length) // deletion
+ (length - 1) // transposition
+ (length * ALPHABET_SIZE) // alteration
+ (length + 1) * ALPHABET_SIZE; // insertionCode Snippets
return (length) + // deletion
(length - 1) + // transposition
(length * ALPHABET_SIZE) + // alteration
(length + 1) * ALPHABET_SIZE; // insertionreturn (length) // deletion
+ (length - 1) // transposition
+ (length * ALPHABET_SIZE) // alteration
+ (length + 1) * ALPHABET_SIZE; // insertionContext
StackExchange Code Review Q#98069, answer score: 11
Revisions (0)
No revisions yet.