patterncMinor
Increasing performance of OpenMP based advection equation solver for Xeon Phi
Viewed 0 times
phiequationopenmpperformancebasedincreasingforsolveradvectionxeon
Problem
I am solving linear advection equation in order to learn parallel programming. I am running this code on Xeon Phi co processor. Below is a plot of how it scales up with increasing number of threads. I am still in learning process and this is my first "parallel" code. I am also not very familiar with architecture specific optimization.
Comments on how to improve the performance of this code and what other stuff should I look into will be great. Also link for writeup or further tutorials to follow will be really helpful.
The plot is for
```
#include
#include
#include
#include
#include
int64_t TimeInMicros() {
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec*1000000 + tv.tv_usec;
}
struct Grid{
int nx;
double u, f, *res;
double a, cfl, dx, tf;
};
struct ThreadData{
int tid, maxthreads;
struct Grid *grid;
};
void solver(void args){
struct ThreadData td = (struct ThreadData )args;
int tid = td->tid;
struct Grid *grid = td->grid;
double *u = grid->u;
double *f = grid->f;
double *res = grid->res;
int nx = grid->nx;
double tf = grid->tf;
double cfl = grid->cfl;
double a = grid->a;
double dx = grid->dx;
double dt = cfl*dx/a;
double t = 0.0;
int chunk = nx/(td->maxthreads);
int start = tid*chunk;
int rc;
while(t < tf){
// sync here
#pragma omp barrier
if(start == 0){
f[start+1:chunk] = a*u[start:chunk];
}
else{
f[start:chunk+1] = a*u[start-1:chunk+1];
}
// sync here
#pragma omp barrier
if(start == 0){
f[start] = f[nx-1];
}
res[start:chunk] = -(f[start+1:chunk] - f[start:chunk])/dx;
u[start:chunk] += res[start:chunk]*dt;
t+=dt;
}
return NULL;
}
int main(int argc, char*argv[]){
int nx=atoi(argv[1]);
int nthreads=atoi(argv[2]);
if(nx%nthreads != 0){
printf("ERROR: Number of cells should be integral multiple of number of threads \n");
exit(1);
}
double *u = new double[nx]();
double
Comments on how to improve the performance of this code and what other stuff should I look into will be great. Also link for writeup or further tutorials to follow will be really helpful.
The plot is for
nx=100000.```
#include
#include
#include
#include
#include
int64_t TimeInMicros() {
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec*1000000 + tv.tv_usec;
}
struct Grid{
int nx;
double u, f, *res;
double a, cfl, dx, tf;
};
struct ThreadData{
int tid, maxthreads;
struct Grid *grid;
};
void solver(void args){
struct ThreadData td = (struct ThreadData )args;
int tid = td->tid;
struct Grid *grid = td->grid;
double *u = grid->u;
double *f = grid->f;
double *res = grid->res;
int nx = grid->nx;
double tf = grid->tf;
double cfl = grid->cfl;
double a = grid->a;
double dx = grid->dx;
double dt = cfl*dx/a;
double t = 0.0;
int chunk = nx/(td->maxthreads);
int start = tid*chunk;
int rc;
while(t < tf){
// sync here
#pragma omp barrier
if(start == 0){
f[start+1:chunk] = a*u[start:chunk];
}
else{
f[start:chunk+1] = a*u[start-1:chunk+1];
}
// sync here
#pragma omp barrier
if(start == 0){
f[start] = f[nx-1];
}
res[start:chunk] = -(f[start+1:chunk] - f[start:chunk])/dx;
u[start:chunk] += res[start:chunk]*dt;
t+=dt;
}
return NULL;
}
int main(int argc, char*argv[]){
int nx=atoi(argv[1]);
int nthreads=atoi(argv[2]);
if(nx%nthreads != 0){
printf("ERROR: Number of cells should be integral multiple of number of threads \n");
exit(1);
}
double *u = new double[nx]();
double
Solution
-
You should first check for the number of command line arguments via
In addition, your first existing check in
-
All of your erroneous outputs shouldn't be printed to
They should instead be printed to
-
This comment is misleading:
They're not actually being initialized; they're being assigned after declaration. Initialization involves giving a variable its type and its initial value. You should do that here, which will keep variables as close in scope as possible, which is generally preferred for maintenance concerns.
Either way, you don't really need that comment. It's already clear what's being done (even after making this correction), and comments shouldn't state the obvious.
-
Also regarding closest possible variable scope: you have
-
This line doesn't look right:
It appears that you're looking for
This will return the maximum number of available threads for a parallel environment. If this is not really your intent, then rename
You should first check for the number of command line arguments via
argc. If there are too few, print an error message and then terminate the program.if (argc < 3) {
// print an error message...
return EXIT_FAILURE;
}In addition, your first existing check in
main() should also return EXIT_FAILURE, instead of calling exit(1). The latter is non-portable, unlike the former, and it's already safe to return from main() if the program cannot continue execution. More info about that here.-
All of your erroneous outputs shouldn't be printed to
printf():printf("ERROR: Number of cells should be integral multiple of number of threads \n");They should instead be printed to
fprintf() and stderr:fprintf(stderr, "Number of cells should be integral multiple of number of threads \n");-
This comment is misleading:
// initialize
u[0:nx] = 0.0;
u[nx/4:nx/2] = 1.0;
f[0:nx+1] = 0.0;
res[0:nx] = 0.0;They're not actually being initialized; they're being assigned after declaration. Initialization involves giving a variable its type and its initial value. You should do that here, which will keep variables as close in scope as possible, which is generally preferred for maintenance concerns.
Either way, you don't really need that comment. It's already clear what's being done (even after making this correction), and comments shouldn't state the obvious.
-
Also regarding closest possible variable scope: you have
i declared towards the top of main(), but you don't use it until the for loop towards the end. If you don't have C99 (and thus cannot initialize i within the loop statement instead), declare i right before the loop.-
This line doesn't look right:
td.maxthreads = omp_get_num_threads();It appears that you're looking for
omp_get_max_threads():td.maxthreads = omp_get_max_threads();This will return the maximum number of available threads for a parallel environment. If this is not really your intent, then rename
maxthreads to something more accurate.Code Snippets
if (argc < 3) {
// print an error message...
return EXIT_FAILURE;
}printf("ERROR: Number of cells should be integral multiple of number of threads \n");fprintf(stderr, "Number of cells should be integral multiple of number of threads \n");// initialize
u[0:nx] = 0.0;
u[nx/4:nx/2] = 1.0;
f[0:nx+1] = 0.0;
res[0:nx] = 0.0;td.maxthreads = omp_get_num_threads();Context
StackExchange Code Review Q#54666, answer score: 3
Revisions (0)
No revisions yet.