HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Increasing performance of OpenMP based advection equation solver for Xeon Phi

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
phiequationopenmpperformancebasedincreasingforsolveradvectionxeon

Problem

I am solving linear advection equation in order to learn parallel programming. I am running this code on Xeon Phi co processor. Below is a plot of how it scales up with increasing number of threads. I am still in learning process and this is my first "parallel" code. I am also not very familiar with architecture specific optimization.

Comments on how to improve the performance of this code and what other stuff should I look into will be great. Also link for writeup or further tutorials to follow will be really helpful.

The plot is for nx=100000.

```
#include
#include
#include
#include
#include

int64_t TimeInMicros() {
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec*1000000 + tv.tv_usec;
}

struct Grid{
int nx;
double u, f, *res;
double a, cfl, dx, tf;
};

struct ThreadData{
int tid, maxthreads;
struct Grid *grid;

};

void solver(void args){

struct ThreadData td = (struct ThreadData )args;

int tid = td->tid;

struct Grid *grid = td->grid;

double *u = grid->u;
double *f = grid->f;
double *res = grid->res;

int nx = grid->nx;
double tf = grid->tf;
double cfl = grid->cfl;
double a = grid->a;
double dx = grid->dx;

double dt = cfl*dx/a;
double t = 0.0;

int chunk = nx/(td->maxthreads);
int start = tid*chunk;

int rc;

while(t < tf){
// sync here
#pragma omp barrier

if(start == 0){
f[start+1:chunk] = a*u[start:chunk];
}
else{
f[start:chunk+1] = a*u[start-1:chunk+1];
}

// sync here
#pragma omp barrier

if(start == 0){
f[start] = f[nx-1];
}

res[start:chunk] = -(f[start+1:chunk] - f[start:chunk])/dx;
u[start:chunk] += res[start:chunk]*dt;
t+=dt;
}
return NULL;
}

int main(int argc, char*argv[]){
int nx=atoi(argv[1]);
int nthreads=atoi(argv[2]);

if(nx%nthreads != 0){
printf("ERROR: Number of cells should be integral multiple of number of threads \n");
exit(1);
}

double *u = new double[nx]();
double

Solution

-
You should first check for the number of command line arguments via argc. If there are too few, print an error message and then terminate the program.

if (argc < 3) {
    // print an error message...
    return EXIT_FAILURE;
}


In addition, your first existing check in main() should also return EXIT_FAILURE, instead of calling exit(1). The latter is non-portable, unlike the former, and it's already safe to return from main() if the program cannot continue execution. More info about that here.

-
All of your erroneous outputs shouldn't be printed to printf():

printf("ERROR: Number of cells should be integral multiple of number of threads \n");


They should instead be printed to fprintf() and stderr:

fprintf(stderr, "Number of cells should be integral multiple of number of threads \n");


-
This comment is misleading:

// initialize
u[0:nx] = 0.0;
u[nx/4:nx/2] = 1.0;
f[0:nx+1] = 0.0;
res[0:nx] = 0.0;


They're not actually being initialized; they're being assigned after declaration. Initialization involves giving a variable its type and its initial value. You should do that here, which will keep variables as close in scope as possible, which is generally preferred for maintenance concerns.

Either way, you don't really need that comment. It's already clear what's being done (even after making this correction), and comments shouldn't state the obvious.

-
Also regarding closest possible variable scope: you have i declared towards the top of main(), but you don't use it until the for loop towards the end. If you don't have C99 (and thus cannot initialize i within the loop statement instead), declare i right before the loop.

-
This line doesn't look right:

td.maxthreads = omp_get_num_threads();


It appears that you're looking for omp_get_max_threads():

td.maxthreads = omp_get_max_threads();


This will return the maximum number of available threads for a parallel environment. If this is not really your intent, then rename maxthreads to something more accurate.

Code Snippets

if (argc < 3) {
    // print an error message...
    return EXIT_FAILURE;
}
printf("ERROR: Number of cells should be integral multiple of number of threads \n");
fprintf(stderr, "Number of cells should be integral multiple of number of threads \n");
// initialize
u[0:nx] = 0.0;
u[nx/4:nx/2] = 1.0;
f[0:nx+1] = 0.0;
res[0:nx] = 0.0;
td.maxthreads = omp_get_num_threads();

Context

StackExchange Code Review Q#54666, answer score: 3

Revisions (0)

No revisions yet.