snippetcppMinor
Parse a file containing key-value pairs followed by coordinates of nodes in a graph
Viewed 0 times
containingnodesfilecoordinatesgraphfollowedvalueparsepairskey
Problem
I've a file which is divided into two parts:
The header has of course a different format than the data part. The header part contains lines that look like this
The data part starts with a delimiter, which in my specific case should always be
In the header, I only need to keep track of two key-value pairs: the
Each line or entry in the data section should be formatted as
INDEX X_COORDINATE Y_COORDINATE
There could be more than one space between the values.
I'm parsing the header and the data section differently. The first one is with
I want my code first to be correct, but I would like it to also be as efficient and fast as possible, because this parsing of the file, structured as I've been describing above, is part of a program which must be as fast as possible.
Note, I'm throwing exceptions whenever a line is not formatted as I was expecting. I think this a good way of handling these kinds of errors in general, but I don't know if in C++ this is actually a good idea. Notice that I've already read something about exception handling in C++.
I think this is the relevant code for you:
```
int dimension = -1;
int best_known = -1;
vector > nodes;
void read_file(const char *file_name) {
ifstream file(file_name);
if (file.fail()) {
throw runtime_error("file failed to open");
}
string line;
si
- The header
- The data part
The header has of course a different format than the data part. The header part contains lines that look like this
KEY: VALUE or KEY : VALUE or KEY:VALUE, in theory, clearly. The data part starts with a delimiter, which in my specific case should always be
NODE_COORD_SECTION, and ends with another delimiter, which should always be EOF. In the header, I only need to keep track of two key-value pairs: the
DIMENSION and the BEST_KNOWN.Each line or entry in the data section should be formatted as
INDEX X_COORDINATE Y_COORDINATE
There could be more than one space between the values.
INDEX should be an int, X_COORDINATE and Y_COORDINATE should be floats or doubles.I'm parsing the header and the data section differently. The first one is with
getline with the help of the find method of the string class, and the second with the overloaded operator >> of the ifstream object. I've read on a Stack Overflow's post that mixing these two ways of parsing lines could be dangerous and thus should be avoided, but I thought that for my case it would be the best way to do it.I want my code first to be correct, but I would like it to also be as efficient and fast as possible, because this parsing of the file, structured as I've been describing above, is part of a program which must be as fast as possible.
Note, I'm throwing exceptions whenever a line is not formatted as I was expecting. I think this a good way of handling these kinds of errors in general, but I don't know if in C++ this is actually a good idea. Notice that I've already read something about exception handling in C++.
I think this is the relevant code for you:
```
int dimension = -1;
int best_known = -1;
vector > nodes;
void read_file(const char *file_name) {
ifstream file(file_name);
if (file.fail()) {
throw runtime_error("file failed to open");
}
string line;
si
Solution
One problem we all know regarding speed is caching. A
the values
you lose that nice property, since we know have to access
and we really want to keep the number of
and create the
Next, you usually don't want to use the whole
If we follow all of these points, we end up with
Note that I've moved the parse of the single node coordinates after the initial
Concerning your additional questions: yes, exceptions provide a way to control your flow, but so would error return codes, e.g.
The choice is yours, but exception handling and stack unwinding is often a hassle. The compiler can use some optimizations if it knows that a certain part of your code will never throw. However, since we're using
Also note that you slightly violate the DRY principle in your
Other than that, well done. Your code doesn't fall into other pitfalls that usually happen, e.g.
is exactly how you should handle formatted extraction with
std::vector has a nice property, namely that &vec[i] + 1 == &vec[i+1], e.g. the memory is continuous. So whenever you have a loop likefor(size_t i = 0; i < values.size(); ++i) {
values[i] += 2;
}the values
values[i+1], values[i+2] … will already reside in your cache, and you don't have to access your main memory for those. However if you store only pointers in your vector, ie.std::vector > nodes;you lose that nice property, since we know have to access
vec[i], and &(vec[i]) != &(*vec[i+1]). The above code becomesfor(size_t i = 0; i < values.size(); ++i) {
*(values[i]) += 2;
}and we really want to keep the number of
* low. Luckily, unless you want to allow nodes[i] == nullptr for some i, there is no need to use a layer of indirection. Just use std::vector nodes;and create the
Node at the end of your vector with std::vector::emplace_back:while (file >> index >> x >> y) {
nodes.emplace_back(index, x, y);
}Next, you usually don't want to use the whole
namespace std. It's considered bad practice. And last but not least, you want to use variables as local as possible, to get rid of errors that are hard to debug (e.g. found_at should be declared/initialized in your if).If we follow all of these points, we end up with
int dimension = -1;
int best_known = -1;
std::vector nodes;
void read_file(const char *file_name) {
std::ifstream file(file_name);
if (file.fail()) {
throw std::runtime_error("file failed to open");
}
std::string line;
while (std::getline(file, line)) {
if (line.find("BEST_KNOWN") != std::string::npos) {
const auto found_at = line.find(':');
if (found_at == std::string::npos) {
throw std::runtime_error("line containing 'BEST_KNOWN' formatted not as expected");
}
best_known = std::stoi(line.substr(found_at + 1));
} else if (line.find("DIMENSION") != std::string::npos) {
const auto found_at = line.find(':');
if (found_at == std::string::npos) {
throw std::runtime_error("line containing 'DIMENSION' formatted not as expected");
}
dimension = std::stoi(line.substr(found_at + 1));
} else if (line.find("NODE_COORD_SECTION") != std::string::npos) {
break;
}
}
if (dimension == -1 || best_known == -1) {
throw std::runtime_error("dimension and best known result should have already been read");
}
unsigned index;
double x, y;
while (file >> index >> x >> y) {
nodes.emplace_back(index, x, y);
}
file.close();
}Note that I've moved the parse of the single node coordinates after the initial
getline approach.Concerning your additional questions: yes, exceptions provide a way to control your flow, but so would error return codes, e.g.
enum READ_ERROR { BEST_KNOWN_FORMAT, DIMENSION_FORMAT, BEST_DIM_NOT_READ, READ_OK };
....
if (dimension == -1 || best_known == -1) {
return BEST_DIM_NOT_READ;
}The choice is yours, but exception handling and stack unwinding is often a hassle. The compiler can use some optimizations if it knows that a certain part of your code will never throw. However, since we're using
nodes.emplace_back, we're already using at least one function that can possibly throw.Also note that you slightly violate the DRY principle in your
line.find("DIMENSION") and line.find("BEST_KNOWN") lines, but another layer of indirection would make things slightly more confusing (for the reader).Other than that, well done. Your code doesn't fall into other pitfalls that usually happen, e.g.
while(file >> index >> x >> y)is exactly how you should handle formatted extraction with
iostream.Code Snippets
for(size_t i = 0; i < values.size(); ++i) {
values[i] += 2;
}std::vector <std::unique_ptr<Node>> nodes;for(size_t i = 0; i < values.size(); ++i) {
*(values[i]) += 2;
}std::vector <Node> nodes;while (file >> index >> x >> y) {
nodes.emplace_back(index, x, y);
}Context
StackExchange Code Review Q#142800, answer score: 4
Revisions (0)
No revisions yet.