[CSC 435] Plots
Andrew J. Pounds
pounds_aj at mercer.edu
Thu Apr 7 09:16:50 EDT 2016
On 04/07/2016 04:01 AM, Vicenzo Abichequer wrote:
>
> Hi, Andrew!
>
> I understood from the description that I don’t need to plot every
> single optimization flag that I power on, right? I simply turn all of
> them on and make the plots. Also, I should only plot the dot, mmm, and
> the solver functions, all of them compared to the ATLAS library and
> fit it to Amdahl’s law, when applicable. Is this correct?
>
> Thanks.
>
Here is what I recommend as it will clearly demonstrate how your code is
performing on the hardware:
1. turn on all your optimizations for all your codes that will enhance
your single threaded performance. We focused in the last paper on
single processor optimization, our focus now is on shared memory
parallelization. It is assumed that you have done all you can to
optimize the single processor components.
2. I recommend that you first construct a single threaded graph of
megaflops vs. vector length. On the same plot also place comparison
timings for the single threaded ATLAS function DDOT. You can use
gcov to figure out how many floating point operations you are doing
in your code and assume that the same number is required for the
ATLAS version. A table of actual times would also be useful. Then,
for one of your large vector lengths (above 10000), build the
parallel performance speedup results and fit them to Amdahl's law.
On the same graph also plot the speedup curve for the threaded ATLAS
library. You will need one parallel speedup plot for OpenMP and one
for Pthreads
3. Repeat step 2 for Matrix Multiplication. Use DGEMM from the ATLAS
library.
4. Repeat step 2 for the Direct Linear Solver. Use the SGESV from the
ATLAS library.
5. For the Iterative Linear Solver there exists no similar routine in
the ATLAS library and since it relies on convergence determining its
FLOP ratio is also a little more problematic. I recommend that for
the iterative solver you reproduce your speedup plots only from
step 4 (both OpenMP and Pthreads) and ADD another set of data and
fit for the iterative linear solver.
By my count that is 11 different graphs. To make sure that we get
consistent results we all need to all use something similar for the
generation of our A matrix. I have started using the following. It
forces the iterative solver to do several iterations, but has not yet
failed to converge.
do i = 1, DIM
vecx(i) = i
sumrow = 0.0
do j = 1, DIM
if ( i .ne. j) then
call random_number(ranval)
matrixa(j,i) = ranval
sumrow = sumrow + ranval
call random_number(ranval)
if ( ranval < 0.5 ) matrixa(j,i) = -matrixa(j,i)
endif
matrixa(i,i) = sumrow + 1.0
enddo
enddo
--
Andrew J. Pounds, Ph.D. (pounds_aj at mercer.edu)
Professor of Chemistry and Computer Science
Mercer University, Macon, GA 31207 (478) 301-5627
http://faculty.mercer.edu/pounds_aj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20160407/3a10b021/attachment.html>
More information about the csc435
mailing list