[CSC 435] Plots

Andrew J. Pounds pounds_aj at mercer.edu
Thu Apr 7 09:16:50 EDT 2016


On 04/07/2016 04:01 AM, Vicenzo Abichequer wrote:
>
> Hi, Andrew!
>
> I understood from the description that I don’t need to plot every 
> single optimization flag that I power on, right? I simply turn all of 
> them on and make the plots. Also, I should only plot the dot, mmm, and 
> the solver functions, all of them compared to the ATLAS library and 
> fit it to Amdahl’s law, when applicable. Is this correct?
>
> Thanks.
>
Here is what I recommend as it will clearly demonstrate how your code is 
performing on the hardware:

 1. turn on all your optimizations for all your codes that will enhance
    your single threaded performance.  We focused in the last paper on
    single processor optimization, our focus now is on shared memory
    parallelization.  It is assumed that you have done all you can to
    optimize the single processor components.
 2. I recommend that you first construct a single threaded graph of
    megaflops vs. vector length.  On the same plot also place comparison
    timings for the single threaded ATLAS function DDOT.  You can use
    gcov to figure out how many floating point operations you are doing
    in your code and assume that the same number is required for the
    ATLAS version.  A table of actual times would also be useful.  Then,
    for one of your large vector lengths (above 10000), build the
    parallel performance speedup results and fit them to Amdahl's law. 
    On the same graph also plot the speedup curve for the threaded ATLAS
    library.  You will need one parallel speedup plot for OpenMP and one
    for Pthreads
 3. Repeat step 2 for Matrix Multiplication. Use DGEMM from the ATLAS
    library.
 4. Repeat step 2 for the Direct Linear Solver. Use the SGESV from the
    ATLAS library.
 5. For the Iterative Linear Solver there exists no similar routine in
    the ATLAS library and since it relies on convergence determining its
    FLOP ratio is also a little more problematic.  I recommend that for
    the iterative solver you  reproduce your speedup plots only from
    step 4 (both OpenMP and Pthreads) and ADD another set of data and
    fit for the iterative linear solver.

By my count that is 11 different graphs.  To make sure that we get 
consistent results we all need to all use something similar for the 
generation of our A matrix.   I have started using the following.  It 
forces the iterative solver to do several iterations, but has not yet 
failed to converge.

do i = 1, DIM
    vecx(i) = i
    sumrow = 0.0
    do j = 1, DIM
    if ( i .ne. j) then
      call random_number(ranval)
      matrixa(j,i) = ranval
      sumrow = sumrow + ranval
      call random_number(ranval)
      if ( ranval < 0.5 ) matrixa(j,i) = -matrixa(j,i)
    endif
    matrixa(i,i) = sumrow + 1.0
    enddo
enddo



-- 
Andrew J. Pounds, Ph.D.  (pounds_aj at mercer.edu)
Professor of Chemistry and Computer Science
Mercer University,  Macon, GA 31207   (478) 301-5627
http://faculty.mercer.edu/pounds_aj

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20160407/3a10b021/attachment.html>


More information about the csc435 mailing list