patterncppMinor
C++ regex golf solver program much slower than original Python
Viewed 0 times
muchgolforiginalthanslowerprogrampythonregexsolver
Problem
I am rewriting Peter Norvig's Python regex golf solver in C++. I had originally planned to post the whole thing here at once when I was finished, but:
Here is what I have so far:
```
#include
#include
#include
#include
#include
#include
#include
#include
#include
using std::endl;
typedef std::pair> covers_pair;
typedef std::map> regex_covers_t;
std::string replacements(char c)
{
return c == '^' || c == '$'? std::string{c} : std::string{c} + '.';
}
std::set subparts(const std::string& word, size_t subpart_size=5)
{
std::set subparts;
for (size_t i = 0; i dotify(const std::string& part)
{
if (part.empty()) {
return std::set{""};
}
auto result = std::set();
for (const auto& rest: dotify(part.substr(1))) {
for (const auto& c: replacements(part.front())) {
result.insert(c + rest);
}
}
return result;
}
const auto rep_chars = {'*', '+', '?'};
std::set repetitions(const std::string& part)
{
const auto dots = std::string("..");
auto repetitions = std::set();
for (size_t i = 1, limit = part.size(); i container, const std::string& delim)
{
if (container.empty()) {
return "";
}
std::ostringstream joiner;
std::string joined_string;
std::copy(
container.cbegin(), container.cend(),
std::ostream_iterator(joiner, delim.c_str()));
joined_string = joiner.str();
// The std::copy call above will append the delimiter to the end, so
// we now remove it.
joined_string.erase(
joined_string.end() - delim.size(), joined_string.end());
- This program takes double the time of the Python version on the driver input shown in main(). And that's just to complete all the functions I have so far: in half the time, the Python program is able to generate solutions both ways. C++ is supposed to be faster than Python; is there a way to speed this up?
- The code is starting to smell in general and it needs a checkup.
Here is what I have so far:
```
#include
#include
#include
#include
#include
#include
#include
#include
#include
using std::endl;
typedef std::pair> covers_pair;
typedef std::map> regex_covers_t;
std::string replacements(char c)
{
return c == '^' || c == '$'? std::string{c} : std::string{c} + '.';
}
std::set subparts(const std::string& word, size_t subpart_size=5)
{
std::set subparts;
for (size_t i = 0; i dotify(const std::string& part)
{
if (part.empty()) {
return std::set{""};
}
auto result = std::set();
for (const auto& rest: dotify(part.substr(1))) {
for (const auto& c: replacements(part.front())) {
result.insert(c + rest);
}
}
return result;
}
const auto rep_chars = {'*', '+', '?'};
std::set repetitions(const std::string& part)
{
const auto dots = std::string("..");
auto repetitions = std::set();
for (size_t i = 1, limit = part.size(); i container, const std::string& delim)
{
if (container.empty()) {
return "";
}
std::ostringstream joiner;
std::string joined_string;
std::copy(
container.cbegin(), container.cend(),
std::ostream_iterator(joiner, delim.c_str()));
joined_string = joiner.str();
// The std::copy call above will append the delimiter to the end, so
// we now remove it.
joined_string.erase(
joined_string.end() - delim.size(), joined_string.end());
Solution
Your
With this fixed, the C++ version runs about 3 times faster than the Python version on my system using
eliminate_dominated function is not equivalent to the original. Try this version:regex_covers_t eliminate_dominated(const regex_covers_t& covers)
{
regex_covers_t new_covers;
const auto& sorted_parts = create_sorted_parts_list(covers);
for (const auto& r: sorted_parts) {
if (covers.at(r).empty()) {
// All remaining r must not cover anything
break;
}
auto is_dominated_lambda =
[&covers, &r](const covers_pair& pair) {
const auto& r2 = pair.first;
bool is_superset = std::includes(
pair.second.cbegin(), pair.second.cend(),
covers.at(r).cbegin(), covers.at(r).cend());
bool is_shorter = r2.size() <= r.size();
return is_superset && is_shorter;
};
bool is_dominated = std::any_of(
new_covers.cbegin(), new_covers.cend(),
is_dominated_lambda);
if (!is_dominated) {
new_covers[r] = get(covers,r);
}
}
return new_covers;
}With this fixed, the C++ version runs about 3 times faster than the Python version on my system using
g++ -O2.Code Snippets
regex_covers_t eliminate_dominated(const regex_covers_t& covers)
{
regex_covers_t new_covers;
const auto& sorted_parts = create_sorted_parts_list(covers);
for (const auto& r: sorted_parts) {
if (covers.at(r).empty()) {
// All remaining r must not cover anything
break;
}
auto is_dominated_lambda =
[&covers, &r](const covers_pair& pair) {
const auto& r2 = pair.first;
bool is_superset = std::includes(
pair.second.cbegin(), pair.second.cend(),
covers.at(r).cbegin(), covers.at(r).cend());
bool is_shorter = r2.size() <= r.size();
return is_superset && is_shorter;
};
bool is_dominated = std::any_of(
new_covers.cbegin(), new_covers.cend(),
is_dominated_lambda);
if (!is_dominated) {
new_covers[r] = get(covers,r);
}
}
return new_covers;
}Context
StackExchange Code Review Q#87987, answer score: 2
Revisions (0)
No revisions yet.