HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Spell Check and Trie implementation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
spelltrieandimplementationcheck

Problem

I have written this code for an Edx course called CS50x (a beginner course).

This problem set required me to write a program which:

  • loaded a dictionary into some sort of data structure (I choose to implement a trie) and which could take an input text and search for misspellings



  • uses a file with an already implemented spell checker that lacked certain functions,



  • one skeleton file to use as the base to create the functions that the first file required



  • and a problem spec



It's hard to include info from the spec in this post (since a lot of it doesn't make since out of context), but I will try.

Also, a library is included with the virtual machine that is used for the course, so any function that you see which is not defined in the base libraries and you don't see defined in my two code files, then it is probably in the library.

Finally, this code works. This code probably won't compile for you since it is specifically made for one development environment (The CS50 "Appliance").

TLDR Here is the important thing to know: This is a code probably won't work for you since it is staged in a highly customized environment and comes with a custom library (Not sure if I used it in this program)
dictionary.c

The first file is called "dictionary.c" and the goal of it, per the spec, is to implement check, load, size, and unload which will, respectively, check for a word in the data structure, load a word in the data structure, check the size of the data structure, and unload the data structure (free the memory used for it). Dictionary.c contains no main function and is just composed of functions which are used in the second file, "speller.c".
speller.c

Speller.c's purpose in life is to check the spelling of a provided file against the spelling of words in a dictionary. In order to make this file work (it was provided as I stated in the second paragraph) I had to implement dictionary.c. I will include it as it may help your understanding of my code, but it is not im

Solution

-
Naming

Many names in your program are not very meaningful. check can mean anything. In your context it means that the word is known, valid, correct (pick one), so the function shall be named along the lines of isWordKnown. Pretty much the same can be said about calculate.

-
main

It does too much. As a rule of thumb, main shall parse a command line and call appropriate functions. A heavy duty for loop must be replaced by a higher-level actions. As much as I can tell, what it's supposed to be doing is

read_a_word;
validate_the_word;


Try to accommodate this pattern.

-
Global variables

Try to avoid them at all costs. The wider is the scope, the harder is reasoning about the variable. At least restrict them to a file scope:

static struct trieNode * root;


-
check() is unnecessary complicated.

Consider instead

bool check(struct trieNode * root, const char* word)
{
    char currChar;

    for(int i = 0; (currChar = word[i]) != '\0'; i++)
    {
        if (currChar == '\'') {
            index = 26;
        } else if (isalpha(currChar)) {
            index = tolower(currChar) - 'a';
        } else {
            return false;
        }

        if ((node = node->childNodes[index]) == NULL)
            return false;
    }

    return (currChar == '\0' && node->wordComp);
}


You may notice that the loop body naturally disintegrates into 2 distinct operations: calculating the index and navigating the tree. I would seriously consider to factor out index calculation into a separate function. This would help immensely once you'd decide to change the alphabet (like, accept accents, or switch to Croatian).

PS: you may also notice that I decided to pass a root node as a parameter. This enables a complete separation of code from data.

-
Data; performance; etc.

Nothing is certain until it is measured; however the gut feeling is that too much space is wasted. Consider how many English words end with ing - and each ing takes 80+ pointers just to encode less than 3 bytes (ed and s are equally wasteful). Side note: English spelling is very regular; look at rispell for hoops a Russian speller has to jump through!

Code Snippets

read_a_word;
validate_the_word;
static struct trieNode * root;
bool check(struct trieNode * root, const char* word)
{
    char currChar;

    for(int i = 0; (currChar = word[i]) != '\0'; i++)
    {
        if (currChar == '\'') {
            index = 26;
        } else if (isalpha(currChar)) {
            index = tolower(currChar) - 'a';
        } else {
            return false;
        }

        if ((node = node->childNodes[index]) == NULL)
            return false;
    }

    return (currChar == '\0' && node->wordComp);
}

Context

StackExchange Code Review Q#55743, answer score: 2

Revisions (0)

No revisions yet.