[CSC 435] ILS OpenMP

Andrew J. Pounds pounds_aj at mercer.edu
Sat Apr 2 06:14:44 EDT 2016


On 04/01/2016 09:36 PM,  wrote:
>
> Dr. Pounds:
>
>  I have changed the OpenMP to have the pragma:
>
> #pragma omp parallel shared(N,maxiter,tol,a,b,x,XT,i,k,err) 
> private(err,olderr,S,j,l,m) reduction(+ : err). Would this remove the 
> dependency upon S and keep the while loop and outer for loop 
> parallelized while letting the processor vectorize the insides of hte 
> loops? Sorry, about all the trouble but data dependency and the 
> parallelization has been giving me lots of trouble.
> Sincerely,
>
Or you could try something like this...

#pragma omp parallel for shared(A,B,S,X,XT,N) private(i,j)
         for (i=0; i<N; i++) {
             *(S+i) = 0.0;
             for (j=0  ; j<i; j++ ) *(S+i) = *(S+i) + *(A+i*N+j) * *(X+j);
             for (j=i+1; j<N; j++ ) *(S+i) = *(S+i) + *(A+i*N+j) * *(X+j);
             *(S+i) = (*(B+i) - *(S+i))/ *(A+i*N+i);
             *(XT+i) = *(S+i) ;
         }

         for (i=0; i<N; i++) err = fmax(fabs(*(S+i)),err);


This will chunk the first loop into sizes equal to the number of threads 
and completely removes any data dependence between the threads.  The 
inner loops over j will not be parallelized, but will be running on 
independent threads over all the indices.  The last for loop (which 
includes err) will run serially.  You could parellelize that too, but I 
think the speedup would be minimal.

-- 
Andrew J. Pounds, Ph.D.  (pounds_aj at mercer.edu)
Professor of Chemistry and Computer Science
Mercer University,  Macon, GA 31207   (478) 301-5627
http://faculty.mercer.edu/pounds_aj

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20160402/c3e43b05/attachment.html>


More information about the csc435 mailing list