patterncMinor
An assignment algorithm in C
Viewed 0 times
algorithmassignmentstackoverflow
Problem
As a post processing step of a similarity matrix computation that leads to the non-negative matrix
Here
The code runs fine but my problem is that it takes too long (about 100 seconds on my compute node), and this is something I can't afford due to the large number of calls to this function.
I am new to C and it is very well probable that I am missing straightforward performance optimizations, so I would really appreciate if you would comment in case you see a window for improvement.
Background: Blondel et al., A measure of similarity between graph vertices
mt4_ in gpu, I am performing an assignment step to determine which element in rows/columns is most similar to which element in columns/rows. So this is more like a first-come-first-serve algorithm in which the largest similarity is picked up and then that particular row and column is excluded (by flagging the first element of that row and column --using -1.0f) and the process continues as such until all rows/columns are exhausted. The results are exported to two contiguous sub-arrays of sms one for each row-to-column and column-to-row assignment. Here is the function in which sZ is a const unsigned int equal to 5000 (mt4_, and by extension mt2, are of size sZ x sZ):void asn(float *mt2, float *mt4_, unsigned int *sms) {
unsigned int i, j;
unsigned int a, b;
register unsigned long k;
register float mx;
register unsigned int mm = 0, nn = 0;
for (j=0; j mx && mt2[b] != -1.0f) {
mx = mt2[a*sZ+b];
mm = a;
nn = b;
}
}
}
if (fabsf(mt2[mm*sZ+mm] - mt2[mm*sZ+nn]) <= thd)
nn = mm;
sms[k+mm] = nn+1;
mt2[mm*sZ] = mt2[nn] = -1.0f;
}
}
}Here
thd is the const float threshold that introduces a bias towards assigning a column index to its identical row index if their similarity is a within a tolerance thd of the best available score (this is to compensate for numerical errors).The code runs fine but my problem is that it takes too long (about 100 seconds on my compute node), and this is something I can't afford due to the large number of calls to this function.
I am new to C and it is very well probable that I am missing straightforward performance optimizations, so I would really appreciate if you would comment in case you see a window for improvement.
Background: Blondel et al., A measure of similarity between graph vertices
Solution
All of what I'm suggesting here will be done by a good optimizing compiler. Some of what I'm suggesting is computer architecture dependent so you should depend on the compiler rather than doing it if you can. Generate assembly code and look at it to better optimize.
First, register assignment isn't guaranteed, it is only a recommendation to the compiler. You want to move your register variable declarations up. The compiler ignores your register allocations when it runs out of registers.
You want to base your register allocations on what is being used most, since k is only used twice it shouldn't be a register. The variables mm, nn, a and b are used the most (sZ sZ sZ) so they should be registers.
On many computers counting down (auto decrement) is faster than counting up because decrement and test for zero is one opcode:
This does impact readability and maintainability so it should not be used unless the optimizing compiler doesn't do it for you.
Testing for equivalency of floating points is to be discouraged because of rounding errors (floating point errors).
Variable names and constants should be longer and clearer, sZ => MatrixSize.
First, register assignment isn't guaranteed, it is only a recommendation to the compiler. You want to move your register variable declarations up. The compiler ignores your register allocations when it runs out of registers.
void asn(float *mt2, float *mt4_, unsigned int *sms) {
register float mx;
register unsigned int mm = 0, nn = 0;
register unsigned int a, b;
unsigned int i, j;
unsigned long k;You want to base your register allocations on what is being used most, since k is only used twice it shouldn't be a register. The variables mm, nn, a and b are used the most (sZ sZ sZ) so they should be registers.
On many computers counting down (auto decrement) is faster than counting up because decrement and test for zero is one opcode:
for (i=sZ; --i; ) {
mx = -1.0f;
for (a=sZ; --a; ) {
if (mt2[a*sZ] != -1.0f)
for (b=sZ; --b; ) {
if (mt2[a*sZ+b] > mx && mt2[b] != -1.0f) {
mx = mt2[a*sZ+b];
mm = a;
nn = b;
}
}
}This does impact readability and maintainability so it should not be used unless the optimizing compiler doesn't do it for you.
Testing for equivalency of floating points is to be discouraged because of rounding errors (floating point errors).
Variable names and constants should be longer and clearer, sZ => MatrixSize.
Code Snippets
void asn(float *mt2, float *mt4_, unsigned int *sms) {
register float mx;
register unsigned int mm = 0, nn = 0;
register unsigned int a, b;
unsigned int i, j;
unsigned long k;for (i=sZ; --i; ) {
mx = -1.0f;
for (a=sZ; --a; ) {
if (mt2[a*sZ] != -1.0f)
for (b=sZ; --b; ) {
if (mt2[a*sZ+b] > mx && mt2[b] != -1.0f) {
mx = mt2[a*sZ+b];
mm = a;
nn = b;
}
}
}Context
StackExchange Code Review Q#123003, answer score: 4
Revisions (0)
No revisions yet.