HiveBrain v1.2.0
Get Started
← Back to all entries
gotchacppMinor

Checking the difference between old and new XML

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thenewcheckingxmldifferencebetweenandold

Problem

I have two xml files, about 500mb each, an old and a new. I want to find out what items have been added or removed in the new.xml compared with the old.xml.

I had some help on this code and I'm a bit new to C++ but I was wondering if this is the optimal way to approach a problem such as this.

#include 
#include 
#include 
#include 
#include 

#include "include/pugixml.hpp"

#define con(m) std::cout ;

int main()
{
    pugi::xml_document doc;

    str_set a;
    doc.load_file("old.xml");

    // fill set a with just the ids from file a
    for(auto&& node: doc.child("site_entries").children("entry"))
        a.emplace(node.child("id").text().as_string());

    str_set b;
    doc.load_file("new.xml");

    // fill set b with just the ids from file b
    for(auto&& node: doc.child("site_entries").children("entry"))
        b.emplace(node.child("id").text().as_string());

    // now use the  library

    str_set b_from_a;
    std::set_difference(a.begin(), a.end(), b.begin(), b.end()
        , std::inserter(b_from_a, b_from_a.begin()));

    str_set a_from_b;
    std::set_difference(b.begin(), b.end(), a.begin(), a.end()
        , std::inserter(a_from_b, a_from_b.begin()));

    str_set a_and_b;
    std::set_intersection(a.begin(), a.end(), b.begin(), b.end()
        , std::inserter(a_and_b, a_and_b.begin()));

    for(auto&& v: a)
        con("a       : " << v);

    con("");

    for(auto&& v: b)
        con("b       : " << v);

    con("");

    for(auto&& v: b_from_a)
        con("b_from_a: " << v);

    con("");

    for(auto&& v: a_from_b)
        con("a_from_b: " << v);

    con("");

    for(auto&& v: a_and_b)
        con("a_and_b : " << v);

    con("");
}

Solution

Your code seems simple and how it works is clear enough. I only have a few remarks:

-
This line seems a little bit off:

#include "include/pugixml.hpp"


Generally speaking, in C and C++, libraries provide an include dir that has to be added to the search directories so that we can write:

#include "pugixml.hpp"


While it certainly does not make your code invalid, it makes it feel a bit less pedantic.

-
Your code currently does not handle parsing errors. See pugi's documentation for more information about that.

-
Now, let's have a look at your set operations. We will use the following diagram to simplify the explanation:

You are currently computing \$A \setminus B\$ and \$B \setminus A\$, then \$A \cap B\$ (where \$A \setminus B\$ represents the difference between \$A\$ and \$B\$). Technically speaking, it would be faster to compute \$A \cap B\$ first then to compute \$A \setminus (A \cap B)\$ and \$B \setminus (A \cap B)\$ since it would mean fewer elements to compare in the sets (there are fewer elements in \$(A \cap B)\$ than there are in \$A\$ or in \$B\$).

In other words, your code should be:

str_set a_and_b;
std::set_intersection(a.begin(), a.end(), b.begin(), b.end()
    , std::inserter(a_and_b, a_and_b.begin()));

str_set b_from_a;
std::set_difference(a.begin(), a.end(), a_and_b.begin(), a_and_b.end()
    , std::inserter(b_from_a, b_from_a.begin()));

str_set a_from_b;
std::set_difference(b.begin(), b.end(), a_and_b.begin(), a_and_b.end()
    , std::inserter(a_from_b, a_from_b.begin()));

Code Snippets

#include "include/pugixml.hpp"
#include "pugixml.hpp"
str_set a_and_b;
std::set_intersection(a.begin(), a.end(), b.begin(), b.end()
    , std::inserter(a_and_b, a_and_b.begin()));

str_set b_from_a;
std::set_difference(a.begin(), a.end(), a_and_b.begin(), a_and_b.end()
    , std::inserter(b_from_a, b_from_a.begin()));

str_set a_from_b;
std::set_difference(b.begin(), b.end(), a_and_b.begin(), a_and_b.end()
    , std::inserter(a_from_b, a_from_b.begin()));

Context

StackExchange Code Review Q#86723, answer score: 8

Revisions (0)

No revisions yet.