HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Outputting an inverted index to a text file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fileinvertedoutputtingtextindex

Problem

When I run this function for outputting an inverted index to a text file in Debug Configuration, it takes nearly two minutes (96 seconds) with a comparatively tiny dataset, 1252 records with the longest being 76 entries.

std::ostream& operator";
        }
        output_buffer << '\n';
    }
    output_buffer.flush();
    out_stream.write(output_buffer.str().c_str(), output_buffer.str().size());
    return out_stream;
}


The index is a std::map>> data type. It makes debugging other portions of the program tedious and time-consuming. Is there any way to speed it up or am I going to have to live with it?

Solution

In main add:

std::ios_base::sync_with_stdio(false); //   do not keep the C and C++ streams synced.
                                       //   No extra works make this quicker.


The quickest way to flush one stream into another is:

stream << otherStream.rdBuf();


So rather than this:

out_stream.write(output_buffer.str().c_str(), output_buffer.str().size());

// prefer
out_stream << output_buffer.rdBuf();


The other things that slows things down is excessive flushing. Don't flush your stream until you really want the output (so its not good practice to flush the stream while you are writing to it). Do it explicitly afterwords.

out_stream << object << std::flush;


But I see no reason for copying all the data into one stream then re-copying the data into another stream.

std::ostream& operator";
        }
        out_stream << '\n';
    }
    return out_stream;
}

Code Snippets

std::ios_base::sync_with_stdio(false); //   do not keep the C and C++ streams synced.
                                       //   No extra works make this quicker.
stream << otherStream.rdBuf();
out_stream.write(output_buffer.str().c_str(), output_buffer.str().size());

// prefer
out_stream << output_buffer.rdBuf();
out_stream << object << std::flush;
std::ostream& operator<<(std::ostream& out_stream, const InvertedIndex& rhs)
{
    auto& rhs_index = rhs.GetIndex();
    for(auto map_elem : rhs_index) {
        out_stream << map_elem.first;
        auto& cur_postingset = map_elem.second;
        for(auto set_elem : cur_postingset) {
            out_stream << " <" << set_elem.GetDocumentId() << ' ' << set_elem.GetTokenFrequency() << ">";
        }
        out_stream << '\n';
    }
    return out_stream;
}

Context

StackExchange Code Review Q#80720, answer score: 2

Revisions (0)

No revisions yet.