patterncppMinor
Finding nearest float values corresponding to a data point (of three variables) in a table with multiple columns using C++
Viewed 0 times
threecolumnspointfloatnearestwithusingvariablesfindingvalues
Problem
I have a data file which contains floating point data arranged in 4 columns. Each row represent a specific data point. The first column(X) is made up of 100 different values but the values are repeating. Fortunately they are sorted in ascending order. For each distinct value in first column the second column(Y) has a different number of values which are sorted but repetative in nature. Finally, for each distinct value in second column there are a no. of different values in thrid column(Z) which are sorted and non-repetative. Data file can be viewed here
I have to find the nearest data point corresponding to a specific value(x,y,z) from first three column and write the value present in the forth column of that data point. Following is the working code for this problem:
```
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
using namespace std;
#define Nc 381443
struct data { //Stores value of data points row-wise
float H;
float Ba;
float SO4;
float sup;
};
struct ioninfo //To store value of a specific variable and count the no. of times it is repeating
{
float val;
int count;
};
double write_absolute(double value)
{
if (value Hion) //To find value nearest to the first(x) variable in first column
{
vector diff(100);
for (int i = 0; i nH_Ba, vector Baion) //To find value nearest to the second(y) variable in second column
{
vector diffBa(nH_Ba[index]);
int temp=0;
for (int i = 0; i nH_Ba,vector Hion, vector Baion, vector v) //To find value nearest to the third(z) variable in third column
{
int sumA=0,sumB=0;
for (int i = 0; i diffz(Baion[temp+indexy].count);
int pos=0;
for (int i = (sumA+sumB); i v(Nc);
vector Hion(100);
ifstream fin( "data2.txt");
double X,Y,Z,S;
double X1;
int Hc[101];
X=0.0,Y=0.0,Z=0.0,X1=0.0;
int count=0;
int local_count=1;
int i=0;
//Reading data points from file
for(string line; getline( fin, l
I have to find the nearest data point corresponding to a specific value(x,y,z) from first three column and write the value present in the forth column of that data point. Following is the working code for this problem:
```
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
using namespace std;
#define Nc 381443
struct data { //Stores value of data points row-wise
float H;
float Ba;
float SO4;
float sup;
};
struct ioninfo //To store value of a specific variable and count the no. of times it is repeating
{
float val;
int count;
};
double write_absolute(double value)
{
if (value Hion) //To find value nearest to the first(x) variable in first column
{
vector diff(100);
for (int i = 0; i nH_Ba, vector Baion) //To find value nearest to the second(y) variable in second column
{
vector diffBa(nH_Ba[index]);
int temp=0;
for (int i = 0; i nH_Ba,vector Hion, vector Baion, vector v) //To find value nearest to the third(z) variable in third column
{
int sumA=0,sumB=0;
for (int i = 0; i diffz(Baion[temp+indexy].count);
int pos=0;
for (int i = (sumA+sumB); i v(Nc);
vector Hion(100);
ifstream fin( "data2.txt");
double X,Y,Z,S;
double X1;
int Hc[101];
X=0.0,Y=0.0,Z=0.0,X1=0.0;
int count=0;
int local_count=1;
int i=0;
//Reading data points from file
for(string line; getline( fin, l
Solution
I see some things that may help you improve your program.
Eliminate unused variables
Unused variables are a sign of poor code quality, so eliminating them should be a priority. In this code,
Don't abuse
Putting
Use `
public:
Ion(float h=0, float ba=0, float so4=0, fl
Eliminate unused variables
Unused variables are a sign of poor code quality, so eliminating them should be a priority. In this code,
X1 and count within main are set to a value, never used, and Hc is unused entirely. My compiler also tells me that. Your compiler is probably also smart enough to tell you that, if you ask it to do so. Don't abuse
using namespace stdPutting
using namespace std at the top of every program is a bad habit that you'd do well to avoid. It isn't necessarily wrong to use, but be aware of when you absolutely shouldn't do it (such as in header files).Use `
instead of
The difference between the two forms is that the former defines things within the std:: namespace versus into the global namespace. Language lawyers have lots of fun with this, but for daily use I'd recommend using . See this SO question for details.
Only #include necessary files
There is no good reason that this program should need both and or and . As with the previous point, just use the C++ versions if they're needed at all.
Use existing library functions
Your write_absolute seems to duplicate exactly, the function of std::abs() for no apparent benefit.
Fix array bounds problem
The program currently has the following few lines within main():
for (j = 0; j < Hion[i].count; ++j) {
if (v[total_count].Ba == v[total_count + 1].Ba) {
total_count++;
} else {
// etc.
}
But this and the other similar line below that have a problem because there is nothing that restricts total_count to only point within the valid range of v. In fact, on my machine, this causes a segfault and crash. I think the simplest way to fix it might be this:
for (j = 0; j < Hion[i].count && total_count+1 < Nc; ++j) {
But I haven't thoroughly dissected the program enough to determine if this is a valid fix or not. There is a similar issue in both find_Ba_ion() and find_SO4_ion() because the index and indexy values are not range checked and do go out of range in the program.
Pass by const reference where practical
Each of your find_xx_ion() routines takes one or more vectors as arguments. Because you're passing them by value, each vector is being duplicated and destroyed each time the functions are called, which is a considerable wast of time and memory. In this case, that's easily fixed. Use a const reference instead. For example, the declaration for find_SO4_ion() would look like this:
int find_SO4_ion(double z, int index, int indexy,
const std::vector &nH_Ba,
const std::vector &Hion,
const std::vector &Baion,
const std::vector &v);
Avoid scanf if you can
There are so many well known problems with scanf that you're usually better off avoiding it. Because of the possibility of a buffer overrun, code using scanf should at least specify an input buffer width:
scanf("%1004s", s);
Even better is to use io extractors. You can eliminate the loop and a number of temporary variables by doing this:
while (fin >> v[i].H >> v[i].Ba >> v[i].SO4 >> v[i].sup) {
++i;
}
fin.close();
Eliminate "magic numbers"
There are a few numbers in the code, such as 100 and 381443 that have a specific meaning in their particular context. By using named constants such as MaxNumberOfUniqueValues, the program becomes easier to read and maintain. For cases in which the constant only has sense with respect to a particular object, consider making that constant part of the object. In all cases in C++, prefer using a const int or constexpr int to a #define.
Think about the usage of the program
There are a couple of different possible ways this could be used, and it's not clear which one you have in mind. One is that you are going to read the data into memory from a file and then search it many times. The alternative usage pattern is that you might have many data files, but would only tend to search each one once for a particular set of numbers. Once you determine which is more likely to be the usage, you can optimize in a particular direction.
Use classes and objects
Right now you have a number of relatively complex and error-prone data structures and functions. They are not really tied together in any way, leading to some of the indexing errors I mentioned above. Better would be to use data classes to encapsulate behavior and data. For example, you have a struct named data. First, let's give it a decent name and then let's turn it into a class. I've called it Ion which may not be correct, but it's better than data which is not at all descriptive of what the thing actually contains. Also, I've made all of the data members private:
class Ion {
private:
float H;
float Ba;
float SO4;
float sup;
Now a constructor:
``public:
Ion(float h=0, float ba=0, float so4=0, fl
Code Snippets
for (j = 0; j < Hion[i].count; ++j) {
if (v[total_count].Ba == v[total_count + 1].Ba) {
total_count++;
} else {
// etc.
}for (j = 0; j < Hion[i].count && total_count+1 < Nc; ++j) {int find_SO4_ion(double z, int index, int indexy,
const std::vector<int> &nH_Ba,
const std::vector<ioninfo> &Hion,
const std::vector<ioninfo> &Baion,
const std::vector<data> &v);scanf("%1004s", s);while (fin >> v[i].H >> v[i].Ba >> v[i].SO4 >> v[i].sup) {
++i;
}
fin.close();Context
StackExchange Code Review Q#110930, answer score: 3
Revisions (0)
No revisions yet.