HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

C++ regex golf solver program much slower than original Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
muchgolforiginalthanslowerprogrampythonregexsolver

Problem

I am rewriting Peter Norvig's Python regex golf solver in C++. I had originally planned to post the whole thing here at once when I was finished, but:

  • This program takes double the time of the Python version on the driver input shown in main(). And that's just to complete all the functions I have so far: in half the time, the Python program is able to generate solutions both ways. C++ is supposed to be faster than Python; is there a way to speed this up?



  • The code is starting to smell in general and it needs a checkup.



Here is what I have so far:

```
#include
#include
#include
#include
#include
#include
#include
#include
#include

using std::endl;

typedef std::pair> covers_pair;
typedef std::map> regex_covers_t;

std::string replacements(char c)
{
return c == '^' || c == '$'? std::string{c} : std::string{c} + '.';
}

std::set subparts(const std::string& word, size_t subpart_size=5)
{
std::set subparts;
for (size_t i = 0; i dotify(const std::string& part)
{
if (part.empty()) {
return std::set{""};
}
auto result = std::set();
for (const auto& rest: dotify(part.substr(1))) {
for (const auto& c: replacements(part.front())) {
result.insert(c + rest);
}
}
return result;
}

const auto rep_chars = {'*', '+', '?'};

std::set repetitions(const std::string& part)
{
const auto dots = std::string("..");
auto repetitions = std::set();
for (size_t i = 1, limit = part.size(); i container, const std::string& delim)
{
if (container.empty()) {
return "";
}
std::ostringstream joiner;
std::string joined_string;
std::copy(
container.cbegin(), container.cend(),
std::ostream_iterator(joiner, delim.c_str()));
joined_string = joiner.str();
// The std::copy call above will append the delimiter to the end, so
// we now remove it.
joined_string.erase(
joined_string.end() - delim.size(), joined_string.end());

Solution

Your eliminate_dominated function is not equivalent to the original. Try this version:

regex_covers_t eliminate_dominated(const regex_covers_t& covers)
{
    regex_covers_t new_covers;

    const auto& sorted_parts = create_sorted_parts_list(covers);
    for (const auto& r: sorted_parts) {
        if (covers.at(r).empty()) {
            // All remaining r must not cover anything
            break;
        }

        auto is_dominated_lambda =
            [&covers, &r](const covers_pair& pair) {
                const auto& r2 = pair.first;
                bool is_superset = std::includes(
                        pair.second.cbegin(), pair.second.cend(),
                        covers.at(r).cbegin(), covers.at(r).cend());
                bool is_shorter = r2.size() <= r.size();
                return is_superset && is_shorter;
            };

        bool is_dominated = std::any_of(
                new_covers.cbegin(), new_covers.cend(),
                is_dominated_lambda);

        if (!is_dominated) {
            new_covers[r] = get(covers,r);
        }
    }

    return new_covers;
}


With this fixed, the C++ version runs about 3 times faster than the Python version on my system using g++ -O2.

Code Snippets

regex_covers_t eliminate_dominated(const regex_covers_t& covers)
{
    regex_covers_t new_covers;

    const auto& sorted_parts = create_sorted_parts_list(covers);
    for (const auto& r: sorted_parts) {
        if (covers.at(r).empty()) {
            // All remaining r must not cover anything
            break;
        }

        auto is_dominated_lambda =
            [&covers, &r](const covers_pair& pair) {
                const auto& r2 = pair.first;
                bool is_superset = std::includes(
                        pair.second.cbegin(), pair.second.cend(),
                        covers.at(r).cbegin(), covers.at(r).cend());
                bool is_shorter = r2.size() <= r.size();
                return is_superset && is_shorter;
            };

        bool is_dominated = std::any_of(
                new_covers.cbegin(), new_covers.cend(),
                is_dominated_lambda);

        if (!is_dominated) {
            new_covers[r] = get(covers,r);
        }
    }

    return new_covers;
}

Context

StackExchange Code Review Q#87987, answer score: 2

Revisions (0)

No revisions yet.