gotchacppMinor
Checking the difference between old and new XML
Viewed 0 times
thenewcheckingxmldifferencebetweenandold
Problem
I have two xml files, about 500mb each, an old and a new. I want to find out what items have been added or removed in the new.xml compared with the old.xml.
I had some help on this code and I'm a bit new to C++ but I was wondering if this is the optimal way to approach a problem such as this.
I had some help on this code and I'm a bit new to C++ but I was wondering if this is the optimal way to approach a problem such as this.
#include
#include
#include
#include
#include
#include "include/pugixml.hpp"
#define con(m) std::cout ;
int main()
{
pugi::xml_document doc;
str_set a;
doc.load_file("old.xml");
// fill set a with just the ids from file a
for(auto&& node: doc.child("site_entries").children("entry"))
a.emplace(node.child("id").text().as_string());
str_set b;
doc.load_file("new.xml");
// fill set b with just the ids from file b
for(auto&& node: doc.child("site_entries").children("entry"))
b.emplace(node.child("id").text().as_string());
// now use the library
str_set b_from_a;
std::set_difference(a.begin(), a.end(), b.begin(), b.end()
, std::inserter(b_from_a, b_from_a.begin()));
str_set a_from_b;
std::set_difference(b.begin(), b.end(), a.begin(), a.end()
, std::inserter(a_from_b, a_from_b.begin()));
str_set a_and_b;
std::set_intersection(a.begin(), a.end(), b.begin(), b.end()
, std::inserter(a_and_b, a_and_b.begin()));
for(auto&& v: a)
con("a : " << v);
con("");
for(auto&& v: b)
con("b : " << v);
con("");
for(auto&& v: b_from_a)
con("b_from_a: " << v);
con("");
for(auto&& v: a_from_b)
con("a_from_b: " << v);
con("");
for(auto&& v: a_and_b)
con("a_and_b : " << v);
con("");
}Solution
Your code seems simple and how it works is clear enough. I only have a few remarks:
-
This line seems a little bit off:
Generally speaking, in C and C++, libraries provide an
While it certainly does not make your code invalid, it makes it feel a bit less pedantic.
-
Your code currently does not handle parsing errors. See pugi's documentation for more information about that.
-
Now, let's have a look at your set operations. We will use the following diagram to simplify the explanation:
You are currently computing \$A \setminus B\$ and \$B \setminus A\$, then \$A \cap B\$ (where \$A \setminus B\$ represents the difference between \$A\$ and \$B\$). Technically speaking, it would be faster to compute \$A \cap B\$ first then to compute \$A \setminus (A \cap B)\$ and \$B \setminus (A \cap B)\$ since it would mean fewer elements to compare in the sets (there are fewer elements in \$(A \cap B)\$ than there are in \$A\$ or in \$B\$).
In other words, your code should be:
-
This line seems a little bit off:
#include "include/pugixml.hpp"Generally speaking, in C and C++, libraries provide an
include dir that has to be added to the search directories so that we can write:#include "pugixml.hpp"While it certainly does not make your code invalid, it makes it feel a bit less pedantic.
-
Your code currently does not handle parsing errors. See pugi's documentation for more information about that.
-
Now, let's have a look at your set operations. We will use the following diagram to simplify the explanation:
You are currently computing \$A \setminus B\$ and \$B \setminus A\$, then \$A \cap B\$ (where \$A \setminus B\$ represents the difference between \$A\$ and \$B\$). Technically speaking, it would be faster to compute \$A \cap B\$ first then to compute \$A \setminus (A \cap B)\$ and \$B \setminus (A \cap B)\$ since it would mean fewer elements to compare in the sets (there are fewer elements in \$(A \cap B)\$ than there are in \$A\$ or in \$B\$).
In other words, your code should be:
str_set a_and_b;
std::set_intersection(a.begin(), a.end(), b.begin(), b.end()
, std::inserter(a_and_b, a_and_b.begin()));
str_set b_from_a;
std::set_difference(a.begin(), a.end(), a_and_b.begin(), a_and_b.end()
, std::inserter(b_from_a, b_from_a.begin()));
str_set a_from_b;
std::set_difference(b.begin(), b.end(), a_and_b.begin(), a_and_b.end()
, std::inserter(a_from_b, a_from_b.begin()));Code Snippets
#include "include/pugixml.hpp"#include "pugixml.hpp"str_set a_and_b;
std::set_intersection(a.begin(), a.end(), b.begin(), b.end()
, std::inserter(a_and_b, a_and_b.begin()));
str_set b_from_a;
std::set_difference(a.begin(), a.end(), a_and_b.begin(), a_and_b.end()
, std::inserter(b_from_a, b_from_a.begin()));
str_set a_from_b;
std::set_difference(b.begin(), b.end(), a_and_b.begin(), a_and_b.end()
, std::inserter(a_from_b, a_from_b.begin()));Context
StackExchange Code Review Q#86723, answer score: 8
Revisions (0)
No revisions yet.