patterncppMinor
SAM mapped reads
Viewed 0 times
mappedreadssam
Problem
I write quite a bit of code in C, but haven't done much C++ since my college CS classes. I have been revisiting C++ recently, and thought I would re-implement a program I had previously written in C, but this time in C++. I particularly took advantage of containers and other features standardized recently by the C++11 standard.
I really enjoyed writing the code, and have verified that the C++ version of my program generates the same output as the original C version. However, there is a substantial performance margin--I'm talking like an order of magnitude and then some for the test I ran.
Many programmers claim that C++ performance should be close to C. Of course there's a bit more overhead, but from 48 seconds to 18 minutes--that's just unreasonable.
I'm wondering whether I've done something completely wrong or inefficient, which is causing the performance discrepancy. Any feedback would be welcome.
Here is the C code:
```
/*
Copyright (c) 2013, Daniel S. Standage
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include
#include
#include "khash.h"
//------------------------------------------------------------------------------
// Definitions/prototypes/initializations for data structures, functions, etc.
//------------------------------------------------------------------------------
#define MAX_LINE_LENGTH 8192
I really enjoyed writing the code, and have verified that the C++ version of my program generates the same output as the original C version. However, there is a substantial performance margin--I'm talking like an order of magnitude and then some for the test I ran.
Many programmers claim that C++ performance should be close to C. Of course there's a bit more overhead, but from 48 seconds to 18 minutes--that's just unreasonable.
I'm wondering whether I've done something completely wrong or inefficient, which is causing the performance discrepancy. Any feedback would be welcome.
Here is the C code:
```
/*
Copyright (c) 2013, Daniel S. Standage
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include
#include
#include "khash.h"
//------------------------------------------------------------------------------
// Definitions/prototypes/initializations for data structures, functions, etc.
//------------------------------------------------------------------------------
#define MAX_LINE_LENGTH 8192
Solution
As the comments alluded to, there's a lot of copying going on. Since I don't have profile data (wrong way to go after performance), I'll go over differences in the approach.
main()
In
smr_load_file()
Diving into the
smr_string_split()
Looking at
smr_print_matrix()
smr_collect_molids()
The for loops should be changed similarly, but at least here you use references to capture your iterator's referenced value. The translation between C and C++ here seems a little iffy, as I don't see what captures the
overview
Your code has taken a first stab at becoming idiomatic C++ code, but it has more steps to go. You've replaced the khash data structure with data structures from C++, but are still using them in a largely C fashion. There's nothing inherently wrong with this, although it may surprise many C++ developers, and may be unable to leverage many of the improvements in C++.
The biggest general thing to watch out for are invisible copies of large amounts of data. Some of this can be dodged by making certain to call methods that support
main()
In
main(), the C version pre-allocates an array of options.numfiles pointers, but the C++ version merely uses push_back() to build a vector. Since you know the size, you could make the C++ version pre-allocate similarly by calling rm2seqPerSample.reserve(options.numfiles).smr_load_file returns a pointer in the C version, and a std::unordered_map in the C++ version. While this probably uses RVO to avoid an initial copy or move, this is then pushed into the above vector, which must move or copy the map. It would be more idiomatic in C++ to wrap this into a class, and perhaps call emplace_back instead:struct smr_file : private std::unordered_map
{
smr_file(std::istream& instream, char delim)
{
// code from smr_load_file operating on *this here
}
};
: : :
int main(...)
{
std::vector rm2secPerSample;
: : :
rm2seqPerSample.emplace_back(ifs, options.delim);
: : :
}smr_load_file()
Diving into the
smr_load_file, there are small things and larger things. Starting small, you can replace rm2seq[molid] += 1 with keyvaluepair->second += 1 to avoid a second lookup. Or maybe you can replace the entire find/emplace/increment stanza with just rm2seq[molid] += 1, if you can ensure the key-value-pair instantiates with value 0. (The former could help with efficiencies; the latter could help with local code complexity, which is usually more important.) Going a bit larger, the call to smr_string_split worried me, as string splits are easy to get wrong, and it sounds like this could well be one of your bottlenecks.smr_string_split()
Looking at
smr_string_split, there are a few things I would likely change. First off, a badly formatted line could result in a crash in the calling code (if there's fewer than two delimiters found, only one or two elements will be in the vector, so the following tokens[2] will misbehave). Second, there are a lot of data copies in these few lines of code, such as into the stringstream, out into the string, and onward into the vector elems. I would be tempted to replace this with a more straightforward method that specifically parses out these items, either returning through "out" parameters (bool smr_parse_line(const std::string& line, char delim, int &bflag, std::string& molid)), with a tuple (std::tuple smr_parse_line(const std::string& line, char delim)), or even just implement it inline. As for implementing this method, I would tend towards std::string::find and substr calls, as it seems you only support a predefined number of tokens.smr_print_matrix()
smr_print_matrix starts with a copy or move, though probably elided due to RVO, but then loops suboptimally. The manual iterator bounds checking in the iterator faces a potential performance penalty due to its repeated calculation of end(). I would suggest replacing the for loops with a range for: for (auto& molid : molids), similarly for (auto& rm2seq : rm2seqPerSample). The only trick is handling your conditional output of options.delim. Here your use of rm2seq.find is correct, although I can't guess whether conditionally outputting a char instead of an int would have any effect. If it doesn't, something like options.outstream second : 0; might be a maintenance win (or loss if you hate ?:).smr_collect_molids()
The for loops should be changed similarly, but at least here you use references to capture your iterator's referenced value. The translation between C and C++ here seems a little iffy, as I don't see what captures the
else kh_value(ids, key) = 1.overview
Your code has taken a first stab at becoming idiomatic C++ code, but it has more steps to go. You've replaced the khash data structure with data structures from C++, but are still using them in a largely C fashion. There's nothing inherently wrong with this, although it may surprise many C++ developers, and may be unable to leverage many of the improvements in C++.
The biggest general thing to watch out for are invisible copies of large amounts of data. Some of this can be dodged by making certain to call methods that support
move. Other parts can be removed by changing the ownership of objects to only create them in their final place (such as my smr_file example). Still other parts can be removed by not changing between different representations of the same data (string to stringstream, and so forth). Finally, just because it is C++, don't feel you have to give up C tools that happen to be faster at certain tasks. Instead try to understand why they are faster, and see if there are alternatives in C++ that you didn't already consider.Code Snippets
struct smr_file : private std::unordered_map<std::string, unsigned>
{
smr_file(std::istream& instream, char delim)
{
// code from smr_load_file operating on *this here
}
};
: : :
int main(...)
{
std::vector<smr_file> rm2secPerSample;
: : :
rm2seqPerSample.emplace_back(ifs, options.delim);
: : :
}Context
StackExchange Code Review Q#33689, answer score: 5
Revisions (0)
No revisions yet.