patterncMinor
Merging the vector-group to get the common vector
Viewed 0 times
thegroupmerginggetcommonvector
Problem
Given that I have a set of vectors(or called vector-group), like \$\{\mathbf U_1,\mathbf U_2,\cdots,\mathbf U_n\}\$. Below is a simple instance
Now I would like to calculate their common vector , i.e., a vector that contains all of the elements of
My firend Richard Xu gives me the following algorithm
C function for two vectors
C function for three vectors
```
double min_three(double x, double y, double z) {
if (x <= y && x <= z) return x;
if (y <= x && y <= z) return y;
if (z <= x && z <= y) return z;
}
/*count denotes the length of common vector of U1, U2, U3
len1, len2, len3 denote the length of vector U1, U2, U3, respectively
the pointer variable temp stores the values of common vector*/
int three_vec_common(double U1, double U2, double U3, int len1, int len2, int len3, double t
U1, U2, U3U1 = {0.25, 0.25, 0.5, 0.5, 0.75, 0.75, 0.8};
U2 = {0.21, 0.25, 0.3, 0.6, 0.7, 0.8};
U3 = {0.25, 0.3, 0.7, 0.8};Now I would like to calculate their common vector , i.e., a vector that contains all of the elements of
U1, U2, U3 and has the least length. In addtion, containing all of the elements of U1 means containing {0.25, 0.25, 0.5, 0.5, 0.75, 0.75, 0.8}, rather than {0.25, 0.5, 0.75, 0.8}.My firend Richard Xu gives me the following algorithm
i j k min_val
1 1 1 0.21 j++
1 2 1 0.25 i++ j++ k++
2 3 2 0.25 i++
3 3 2 0.3 j++ k++
3 4 3 0.5 i++
4 4 3 0.5 i++
5 4 3 0.6 j++
5 5 3 0.7 j++ k++
5 6 4 0.75 i++
6 6 4 0.75 i++
7 6 4 0.8 i++ j++ k++
//===>common vector
U = {0.21, 0.25, 0.25, 0.3, 0.5, 0.5, 0.6, 0.7, 0.75, 0.75, 0.8}C function for two vectors
#define min_two(a, b) (a) > (b) ? (b) :(a)
//count denotes the length of common vector of U1, U2
int two_vec_common(double *U1, double *U2, int len1, int len2, double *temp) {
int i, j;
double min_val;
int count = 0;
i = j = 0;
while (i < len1 || j < len2) {
min_val = min_two(U1[i], U2[j]);
temp[count++] = min_val;
if(U1[i] == min_val) i++;
if(U2[j] == min_val) j++;
}
return count;
}C function for three vectors
```
double min_three(double x, double y, double z) {
if (x <= y && x <= z) return x;
if (y <= x && y <= z) return y;
if (z <= x && z <= y) return z;
}
/*count denotes the length of common vector of U1, U2, U3
len1, len2, len3 denote the length of vector U1, U2, U3, respectively
the pointer variable temp stores the values of common vector*/
int three_vec_common(double U1, double U2, double U3, int len1, int len2, int len3, double t
Solution
Here are some things that may help you improve your code.
Use
A number of places in the code should have the
one might instead write this:
Doing so makes it clear that the inputs are not altered and that the
Avoid buffer overflow
The code as written doesn't necessarily have a problem writing past the end of allocated memory, but it also doesn't have any way to avoid it. That is, the
Consider checking for bad pointers
It may be that surrounding code checks explicitly for bad (
Consider a better way of passing variables
It's not to bad to pass two arrays and their lengths, and even three is not completely onerous, but consider how one might pass fifty arrays into a similar function. It's obvious that one would not (and should not!) consider actually having one hundred named parameters, so the better way to do this would be to pass in a pointer to an array of such bufffers instead. To make it simpler, I'd advocate adding a data structure to represent both the length and the array:
Try to use descriptive names
The variable names
Think carefully about
What is the correct interpretation if, say,
Make assumptions explicit
The algorithm will only work correctly if all input arrays are sorted in ascending order. That's not necessarily a bad assumption, but it does need to be explicitly stated in either the program description or the code comments or both.
Generalize the algorithm
If we wanted to accept any number of arrays, we might use a function with this kind of template:
If we consider the algorithm generally, it is:
It doesn't matter if there are 2 or 3 or 1000 input arrays. I'm hoping this is enough to suggest how one might write the remaining general code.
Use
const where practicalA number of places in the code should have the
const keyword added. For example instead of this:int three_vec_common(double *U1, double *U2, double *U3,
int len1, int len2, int len3, double *temp) {one might instead write this:
int three_vec_common(const double *U1, const double *U2, const double *U3,
int len1, int len2, int len3, double *temp) {Doing so makes it clear that the inputs are not altered and that the
temp pointer is an output.Avoid buffer overflow
The code as written doesn't necessarily have a problem writing past the end of allocated memory, but it also doesn't have any way to avoid it. That is, the
temp pointer passed in is assumed to have enough space for any result, but the actual allocated length is not passed in with the pointer, making it impossible for the code to avoid a buffer overflow if the buffer does not happen to have enough memory. Consider checking for bad pointers
It may be that surrounding code checks explicitly for bad (
NULL) pointers or has some other means of preventing them from being passed in, but it still might not be a bad idea to check for NULL pointers before dereferncing memory with either reads or writes.Consider a better way of passing variables
It's not to bad to pass two arrays and their lengths, and even three is not completely onerous, but consider how one might pass fifty arrays into a similar function. It's obvious that one would not (and should not!) consider actually having one hundred named parameters, so the better way to do this would be to pass in a pointer to an array of such bufffers instead. To make it simpler, I'd advocate adding a data structure to represent both the length and the array:
struct array {
size_t size;
double *data;
};Try to use descriptive names
The variable names
count and min_val are good because they give a good clue as to what they're for, but the name temp is not very descriptive or useful to the reader.Think carefully about
signed versus unsignedWhat is the correct interpretation if, say,
len1 is a negative number? I'm guessing that there isn't a reasonable interpretation and that all of the the lengths are expected to be non-negative. I'd recommend making them all of type size_t.Make assumptions explicit
The algorithm will only work correctly if all input arrays are sorted in ascending order. That's not necessarily a bad assumption, but it does need to be explicitly stated in either the program description or the code comments or both.
Generalize the algorithm
If we wanted to accept any number of arrays, we might use a function with this kind of template:
size_t merge(const struct array *arr, size_t arrlen, struct array *out)If we consider the algorithm generally, it is:
- set the current item to the head of each input array
- find the minimum among current items
- add that minimum value to the output array
- advance each input array if the current value is equal to the minimum value
- if all lists are not yet empty, go back to step 2
It doesn't matter if there are 2 or 3 or 1000 input arrays. I'm hoping this is enough to suggest how one might write the remaining general code.
Code Snippets
int three_vec_common(double *U1, double *U2, double *U3,
int len1, int len2, int len3, double *temp) {int three_vec_common(const double *U1, const double *U2, const double *U3,
int len1, int len2, int len3, double *temp) {struct array {
size_t size;
double *data;
};size_t merge(const struct array *arr, size_t arrlen, struct array *out)Context
StackExchange Code Review Q#140377, answer score: 2
Revisions (0)
No revisions yet.