HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Substring searching with Knuth–Morris–Pratt

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
morrissearchingwithsubstringprattknuth

Problem

My program uses the Knuth–Morris–Pratt algorithm to search for a particular substring in a .txt file and then store it in a binary search tree.

Note that testfile.txt should be contain only spaces and normal characters (no newlines! - except maybe as the last byte).

There is a lot of code, so note that I am primarily concerned about the KMP algorithm in countWord.c, which I am new to.

wordCount.h

#ifndef WORDCOUNT_H
#define WORDCOUNT_H
#include
#include

typedef struct BSTnode{
    char data[20];
    struct BSTnode* leftchild;    //if less than, put it on the left
    struct BSTnode* rightchild;   //if greater than, on the right
    int num;

}BSTnode;

char* findWord(FILE*);       //This function is used to find out a new word again and again in the file.
int countWord(FILE*,char*);  //this is used to count out occurrences of the word found
int inputWord(BSTnode*,char*,int);    //this is used to insert the word into the binary search tree(but I don' try to loop through the tree)
int createTable(char*,int,int*);   //this function is for the KMP algorithm

#endif


main.c

#include
#include
#include
#include
#include "wordCount.h"

char arr[20]={0};

int main()
{
       /*initialize the firstNode*/
    BSTnode* firstNode=malloc(sizeof(BSTnode));
    firstNode->leftchild=NULL;
    firstNode->rightchild=NULL;
    strcpy(firstNode->data,"");
    firstNode->num=0;

    FILE* fs=fopen("testfile.txt","r");
    if(!fs){
        printf("Failed to open fiel!!\n");
        return 2;
    }

    int i=0;
    while(++i<=30){         //To keep it simple, I only find out 30 words and search it in the testfile.txt
            memset(arr,'\0',20);
            char* newWord=findWord(fs);

            int totalNumberOfTheWord=countWord(fs,newWord);

            inputWord(firstNode,newWord,totalNumberOfTheWord);
    }

        return 0;

}


findWord.c

```
#include
#include
#include
#include
#include "wordCount.h"

extern char arr[20];
char findWord(FILE fs)
{

Solution

I'll only focus on countWord.c.

Indentation/Spacing

I'm not sure what happened to your indentation in countWord() but it is all over the place. It was hard for me to figure out which code belonged in which blocks.

Your code could use some more spaces. It's hard to read code like this:

if(j==-1||word[i]==word[j]){


Better would be this:

if (j == -1 || word[i] == word[j]) {


Incorrect if

This line of code is incorrect:

if(!lengthOfBuf==fread(buf,1,lengthOfBuf,fs)){


Assuming a nonempty file, the line above translates to:

if(0==fread(buf,1,lengthOfBuf,fs)){


You probably meant this:

if (lengthOfBuf != fread(buf,1,lengthOfBuf,fs)) {


Unnecessary global

The next array is global, but it could easily be moved to be a local variable of the countWord() function.

Upper/lowercase comparison

You currently have a triple check to see if one letter is -32, 0, or +32 away from the other letter. You can do better with:

if (((c1 ^ c2) & ~32) == 0)


where c1 and c2 are two characters you are trying to compare.

Use const keyword

You should use const on arguments that are not written to. In your case, the word argument for both createTable() and countWord() could be marked const.

Code Snippets

if(j==-1||word[i]==word[j]){
if (j == -1 || word[i] == word[j]) {
if(!lengthOfBuf==fread(buf,1,lengthOfBuf,fs)){
if(0==fread(buf,1,lengthOfBuf,fs)){
if (lengthOfBuf != fread(buf,1,lengthOfBuf,fs)) {

Context

StackExchange Code Review Q#94441, answer score: 4

Revisions (0)

No revisions yet.