HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcppMinor

How to search for a word in a sorted text file efficiently?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
efficientlyfilesearchtextwordforhowsorted

Problem

I want to write my own program to search for a word in a sorted text file. In the text file each word is in one line. I want to write this myself because I did not find any simple program (without libraries such as boost) which can provide suggestion word which means a word with most similarity. Here is my code but I don't know if I am in the right or wrong track. Appreciate any suggestion.

#include
#include
#include
using namespace std;

int main(){

  ifstream fin;
  fin.open("/usr/dict/words");
  if(!fin.good()){cout>search;

        coutword)  
        {
            last=middle-1;
            cout<<"b";  //**
        }
        else
        {
            found=true;
            break;
        }
        cout<<first<<" "<<last<<" "<<" "<<middle<<endl;  //**

  }

  return 0;
}

Solution

Your biggest problem is the code does not take into account not finding the word. If it fails your code will loop forever. You need a not found exit case.

This is always a bad sign

while(!fin.eof())


That line is nearly always wrong (in every programming language I know). Especially in your case that should never be true (since you know where the end is).

Remember that the last successful read reads up-to but not past the EOF. The eof-flag is only set when you read past EOF.

So on the last good read you will read up-to the end of file processes the data. There is no more data left in the file but the eof-flag will not be set because you have not read past the EOF yet so you will re-enter the loop. The next attempt to read data will fail (setting the eof-flag), but the read has failed and thus left the variables you just read into in a questionable state.

Luckily you don't get into that state because you move after each search. But it is a sign of the wrong type of test. This should be:

while(searching) //   Set to false when you finish searching.
                 //   Which may be when you find the word
                 //   or there are no more word to search.


You are assuming that words start on the exact boundary of middle!

middle=(first+last)/2;
    fin.seekg(middle);
    fin>>search;


Not sure I believe that will hold. Middle is a good estimate. But you need to find the start of the word. So seek forward for the next word.

middle=(first+last)/2;
    fin.seekg(middle);
    fin.ignore(std::numeric_limits::max(), '\n');

    if (!fin.eof()) // That may have read past EOF
    {
    }


Correct typing in C++ is everything:

fin.seekg(0,ios::end);
int last = fin.tellg(); // This does not return an int.
                        // With todays big files this can be rather a large
                        // number.

// That line should be.
std::streamsize last = fin.tellg();


If you don't care about the actual type (you just want the correct one). Then let the compiler work it out for you.

// C++11 let the compiler deduct the correct type of last.
auto last = fin.tellg()


Why take two lines to open the file:

ifstream fin;
fin.open("/usr/dict/words");

// Much simpler to read and write:
ifstream fin("/usr/dict/words");


Sure you can test for good(). But its better to let the stream decide if it is a usable state.

if(!fin.good()){cout<<"No dictionary file!\n"; exit(1);}

// Prefer to let the stream decide:
if (fin)  // Stream used in a boolean context will convert itself into
          // something useful that is convertible to true if you can
          // use the stream or false if the stream is not usable.
{
     cout<<"No dictionary file!\n";
     exit(1);
}


If the size of the dictionary is not huge (couple of Meg). I personally would just load it into memory and let the standard data structures handle it. Alternatively You could map the file to memory and do a more standard search.

bool isWordInDict(std::string const& word);
{
    struct MyDict: std::set
    {
        typedef std::set::const_iterator const_iterator;

        MyDict()
        {
            std::ifstream fin("/usr/dict/words");

            std::copy(std::istream_iterator(fin), std::istream_iterator(),
                      std::inserter(*this, end()));
        }
    };
    static const MyDict  dict;

    MyDict::const_iterator find = dict.find(word);
    return find != dict.end();
}

Code Snippets

while(!fin.eof())
while(searching) //   Set to false when you finish searching.
                 //   Which may be when you find the word
                 //   or there are no more word to search.
middle=(first+last)/2;
    fin.seekg(middle);
    fin>>search;
middle=(first+last)/2;
    fin.seekg(middle);
    fin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

    if (!fin.eof()) // That may have read past EOF
    {
    }
fin.seekg(0,ios::end);
int last = fin.tellg(); // This does not return an int.
                        // With todays big files this can be rather a large
                        // number.


// That line should be.
std::streamsize last = fin.tellg();

Context

StackExchange Code Review Q#52702, answer score: 6

Revisions (0)

No revisions yet.