patternpythonMinor
Large ASCII file data read
Viewed 0 times
filereadlargeasciidata
Problem
I am moving a project from Python to C++, partly in order to achieve a speed up.
This part of the code reads large .txt data files (well only about 2MB per file, but quite a lot of files), and needs to convert the data to floating point.
This C++ code does is not faster than Python bytecode (.pyc). Regardless, my application requires faster processing. What can you see are the main things I am doing wrong?
Below is a complete standalone representative example that will compile with:
```
#include
#include
#include
#include
#include
#include
#include
class TurtleFileReader {
public:
// pointless constructor for this example
TurtleFileReader(){};
// Turtle read looks for zones in the input file and directs the fileread process.
int TurtleRead() {
std::ifstream readfile;
readfile.open("sampleinput.txt");
// records line we are reading on:
int linenumber = 0;
int data_starts_on_line = 0; // init to 0
// find first zone = line
std::string line;
while (std::getline(readfile, line)) {
linenumber += 1;
if ( line.find("ZONE") > total_z_values;
// at this point, we have come across a zone line in the file. And,
// we know how many data lines to read next.
for (size_t i = 0; i vec = LineSplit(line);
if (vec.size() == 9) {
a1_.push_back(vec[0]);
a2_.push_back(vec[1]);
a3_.push_back(vec[2]);
a4_.push_back(vec[3]);
a5_.push_back(vec[3]);
a6_.push_back(vec[4]);
a7_.push_back(vec[5]);
a8_.push_back(vec[6]);
a9_.push_back(vec[5]);
// Do some check on the data here:
if (vec[0] > 10 || vec[0]
std::vector LineSplit(const std::string& line) {
std::istringstream is(line);
return std::vector(std::istre
This part of the code reads large .txt data files (well only about 2MB per file, but quite a lot of files), and needs to convert the data to floating point.
This C++ code does is not faster than Python bytecode (.pyc). Regardless, my application requires faster processing. What can you see are the main things I am doing wrong?
Below is a complete standalone representative example that will compile with:
cl.exe turtlereader.cpp (or other compiler I believe)```
#include
#include
#include
#include
#include
#include
#include
class TurtleFileReader {
public:
// pointless constructor for this example
TurtleFileReader(){};
// Turtle read looks for zones in the input file and directs the fileread process.
int TurtleRead() {
std::ifstream readfile;
readfile.open("sampleinput.txt");
// records line we are reading on:
int linenumber = 0;
int data_starts_on_line = 0; // init to 0
// find first zone = line
std::string line;
while (std::getline(readfile, line)) {
linenumber += 1;
if ( line.find("ZONE") > total_z_values;
// at this point, we have come across a zone line in the file. And,
// we know how many data lines to read next.
for (size_t i = 0; i vec = LineSplit(line);
if (vec.size() == 9) {
a1_.push_back(vec[0]);
a2_.push_back(vec[1]);
a3_.push_back(vec[2]);
a4_.push_back(vec[3]);
a5_.push_back(vec[3]);
a6_.push_back(vec[4]);
a7_.push_back(vec[5]);
a8_.push_back(vec[6]);
a9_.push_back(vec[5]);
// Do some check on the data here:
if (vec[0] > 10 || vec[0]
std::vector LineSplit(const std::string& line) {
std::istringstream is(line);
return std::vector(std::istre
Solution
The first principle of optimization is: "measure don't guess". So the first step is to use a profiler on your platform to measure the most consuming steps in your algorithm. It may depend on compilation options (optimization turned on/off). On my platform (x86-64 Linux/g++ 4.8.1 with -O3), the most consuming operation is:
I would first try to write a specialization of this method for double and parse line manually (using pointer arithmetic and the
template
std::vector LineSplit(const std::string& line) {
std::istringstream is(line);
return std::vector(std::istream_iterator(is), std::istream_iterator());
}I would first try to write a specialization of this method for double and parse line manually (using pointer arithmetic and the
strtod() function from the STL), then measure and optimize the next bottleneck.Code Snippets
template<typename T>
std::vector<T> LineSplit(const std::string& line) {
std::istringstream is(line);
return std::vector<T>(std::istream_iterator<T>(is), std::istream_iterator<T>());
}Context
StackExchange Code Review Q#51381, answer score: 6
Revisions (0)
No revisions yet.