patterncppMinor
Counting weather events for visualization using Gnuplot
Viewed 0 times
countingeventsforusinggnuplotvisualizationweather
Problem
I wrote this code to analyze ~800 MB of weather data from the US. I am planning on visualizing the data with Gnuplot and Gimp. I have already made the images and a gif file. This code runs somewhat fast, although I do not suppose it is efficent. How can I improve it?
The idea behind my code is:
main code:
functions:
```
void getWeather(std::map& weather, std::ifstream& fin, std::string& city){
std::string line;
while(!fin.eof()){
getline(fin, line);
if(line.find(city) != std::string::npos){
if(line.find("Drought") != std::string::npos){
weather["Drought"]++;
}else if(line.find("Flood") != std::string::npos){
weather["Flood"]++;
}else if(line.find("Heavy Snow") != std::string::npos){
weather["Heavy Snow"]++;
... // I left out some else if statements from here to shorten the post
}else if(line.find("High Surf") != std::string::npos){
weather["High Surf"]++;
}
}
}
}
void outPut(std::map& weather, std::ofstream& fout,std::string& city, int date){
fout << city << "-" << date << std::endl;
for(auto i:weather){
The idea behind my code is:
- find keywords in the lines, such as state names and their weather conditions
- save the weather condition in a map according to the state and year
- get the yearly weather conditions to a file named after the analyzed state
main code:
int main()
{
std::string data[18] = {"stormdata_1996.csv",
... // I left out the files from her on purpose
"stormdata_2013.csv"};
for(int j=0;j weather;
std::ifstream fin(data[j].c_str(),std::ios::in);
getWeather(weather, fin, city[i]);
fin.close();
std::ofstream fout(city[i].c_str(),std::ios::app);
outPut(weather, fout, city[i],j+1996);
fout.close();
weather.clear();
}
}
return 0;
}functions:
```
void getWeather(std::map& weather, std::ifstream& fin, std::string& city){
std::string line;
while(!fin.eof()){
getline(fin, line);
if(line.find(city) != std::string::npos){
if(line.find("Drought") != std::string::npos){
weather["Drought"]++;
}else if(line.find("Flood") != std::string::npos){
weather["Flood"]++;
}else if(line.find("Heavy Snow") != std::string::npos){
weather["Heavy Snow"]++;
... // I left out some else if statements from here to shorten the post
}else if(line.find("High Surf") != std::string::npos){
weather["High Surf"]++;
}
}
}
}
void outPut(std::map& weather, std::ofstream& fout,std::string& city, int date){
fout << city << "-" << date << std::endl;
for(auto i:weather){
Solution
I see a number of things that may help you improve your program.
Use the required
The code uses
Use const where practical
The current
For the
Don't use
Using
Let the computer do the counting
The
What would make more sense is to omit the
Don't use
The code currently contains these lines:
However, that's almost never what you really want. Instead, write this:
This is not only shorter and more direct, but it actually functions in the way you'd want and expect.
Avoid repeating code
In the
This approach has a few problems. First, it's needlessly repetitive since it repeats the event type twice in each pair of lines. Second, it is subject to miscounting because it assumes that, for example, the word "Flood" appearing in the line always means that the event type was "Flood". Assuming you're using this NOAA data, then it's easy to find a counterexample. For instance, if you look at the 2014 data, there was an event on 3 August 2014 for which the event code is "Thunderstorm Wind" but the description contains the word "Flood". This leads to my next suggestion.
Parse the file
An alternative to searching the entirety of each line would be to parse the lines, since it's a CSV file. There are a number of ways to do so, and a number of third-party libraries for that, but I'd be inclined to use the
Think carefully about data structures
Right now the code uses a
Rethink the algorithm
Right now, each input file is scanned once for every state and files are opened and closed with every iteration through the inner loop. Wouldn't it make more sense just to make a single pass through each input file? Further, each state's output is probably small enough that all of the data could be generated in memory and then written out when everything is parsed.
Omit
When a C or C++ program reaches the end of
Note: when I make this suggestion, it's almost invariably followed by one of two kinds of comments: "I didn't know that." or "That's bad advice!" My rationale is that it's safe and useful to rely on compiler behavior explicitly supported by the standard. For C, since C99; see ISO/IEC 9899:1999 section 5.1.2.2.3:
[...] a return from the initial call to the
For C++, since the first standard in 1998; see ISO/IEC 14882:1998 section 3.6.1:
If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;
All versions of both standards since then (C
Use the required
#includesThe code uses
std::map which means that it should #include . It was not difficult to infer, but it helps reviewers if the code is complete. I believe that the full set of required includes is:#include
#include
#include
#include Use const where practical
The current
getWeather() routine does not (and should not) modify the passed city string, and so it should be declared const:void getWeather(std::map& weather, std::ifstream& fin,
const std::string& city){For the
outPut routine, both city and weather should be const.Don't use
std::endl if '\n' will doUsing
std::endl emits a \n and flushes the stream. Unless you really need the stream flushed, you can improve the performance of the code by simply emitting '\n' instead of using the potentially more computationally costly std::endl.Let the computer do the counting
The
main() routine contains this:std::string city[36] = {"TEXAS",
"OREGON",
// ... presumably 33 other states
"MISSOURI"};
for(int i=0;i<36;i++){What would make more sense is to omit the
36 and simply iterate directly over the strings, using a "range for" since you're using C++11. It's a bit peculiar to have 36 states since the United States hasn't had 36 states since 1867, when Nebraska became the 37th state, and even more peculiar that such an array is named city! I'd write it like this:std::string states[] = {"TEXAS",
"OREGON",
// .. maybe 47 other states?
"MISSOURI"};
for(const auto &state : states) {Don't use
eof for the loop conditionThe code currently contains these lines:
while(!fin.eof()){
getline(fin, line);However, that's almost never what you really want. Instead, write this:
while (std::getline(fin, line)) {This is not only shorter and more direct, but it actually functions in the way you'd want and expect.
Avoid repeating code
In the
getWeather() routine, we have a long list of if statements that looks like this:if(line.find("Drought") != std::string::npos){
weather["Drought"]++;
}else if(line.find("Flood") != std::string::npos){
weather["Flood"]++;This approach has a few problems. First, it's needlessly repetitive since it repeats the event type twice in each pair of lines. Second, it is subject to miscounting because it assumes that, for example, the word "Flood" appearing in the line always means that the event type was "Flood". Assuming you're using this NOAA data, then it's easy to find a counterexample. For instance, if you look at the 2014 data, there was an event on 3 August 2014 for which the event code is "Thunderstorm Wind" but the description contains the word "Flood". This leads to my next suggestion.
Parse the file
An alternative to searching the entirety of each line would be to parse the lines, since it's a CSV file. There are a number of ways to do so, and a number of third-party libraries for that, but I'd be inclined to use the
regex part of the standard library. In particular, this can be done with a std::regex_token_iterator. Think carefully about data structures
Right now the code uses a
std::map but since maintaining any particular ordering appears to be unimportant for this code, it would likely confer a performance benefit to use a std::unordered_map instead.Rethink the algorithm
Right now, each input file is scanned once for every state and files are opened and closed with every iteration through the inner loop. Wouldn't it make more sense just to make a single pass through each input file? Further, each state's output is probably small enough that all of the data could be generated in memory and then written out when everything is parsed.
Omit
return 0When a C or C++ program reaches the end of
main the compiler will automatically generate code to return 0, so there is no need to put return 0; explicitly at the end of main. Note: when I make this suggestion, it's almost invariably followed by one of two kinds of comments: "I didn't know that." or "That's bad advice!" My rationale is that it's safe and useful to rely on compiler behavior explicitly supported by the standard. For C, since C99; see ISO/IEC 9899:1999 section 5.1.2.2.3:
[...] a return from the initial call to the
main function is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0.For C++, since the first standard in 1998; see ISO/IEC 14882:1998 section 3.6.1:
If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;
All versions of both standards since then (C
Code Snippets
#include <string>
#include <iostream>
#include <fstream>
#include <map>void getWeather(std::map<std::string, int>& weather, std::ifstream& fin,
const std::string& city){std::string city[36] = {"TEXAS",
"OREGON",
// ... presumably 33 other states
"MISSOURI"};
for(int i=0;i<36;i++){std::string states[] = {"TEXAS",
"OREGON",
// .. maybe 47 other states?
"MISSOURI"};
for(const auto &state : states) {while(!fin.eof()){
getline(fin, line);Context
StackExchange Code Review Q#157557, answer score: 4
Revisions (0)
No revisions yet.