<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 04/07/2016 04:01 AM, Vicenzo
Abichequer wrote:<br>
</div>
<blockquote
cite="mid:85cdb230a2644373b0afb2bac5583623@spiderman.MercerU.local"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
span.EstiloDeEmail17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal">Hi, Andrew!<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I understood from the description that I
don’t need to plot every single optimization flag that I power
on, right? I simply turn all of them on and make the plots.
Also, I should only plot the dot, mmm, and the solver
functions, all of them compared to the ATLAS library and fit
it to Amdahl’s law, when applicable. Is this correct?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks.<o:p></o:p></p>
</div>
</blockquote>
<font face="serif">Here is what I recommend as it will clearly
demonstrate how your code is performing on the hardware:<br>
<br>
</font>
<ol>
<li><font face="serif">turn on all your optimizations for all your
codes that will enhance your single threaded performance. We
focused in the last paper on single processor optimization,
our focus now is on shared memory parallelization. It is
assumed that you have done all you can to optimize the single
processor components.</font></li>
<li><font face="serif">I recommend that you first construct a
single threaded graph of megaflops vs. vector length. On the
same plot also place comparison timings for the single
threaded ATLAS function DDOT. You can use gcov to figure out
how many floating point operations you are doing in your code
and assume that the same number is required for the ATLAS
version. A table of actual times would also be useful. Then,
for one of your large vector lengths (above 10000), build the
parallel performance speedup results and fit them to Amdahl's
law. On the same graph also plot the speedup curve for the
threaded ATLAS library. You will need one parallel speedup
plot for OpenMP and one for Pthreads<br>
</font></li>
<li><font face="serif">Repeat step 2 for Matrix Multiplication.
Use DGEMM from the ATLAS library.<br>
</font></li>
<li><font face="serif">Repeat step 2 for the Direct Linear Solver.
Use the SGESV from the ATLAS library.</font></li>
<li><font face="serif">For the Iterative Linear Solver there
exists no similar routine in the ATLAS library and since it relies
on convergence determining its FLOP ratio is also a little
more problematic. I recommend that for the iterative solver
you reproduce your speedup plots only from step 4 (both
OpenMP and Pthreads) and ADD another set of data and fit for
the iterative linear solver.</font></li>
</ol>
<p><font face="serif">By my count that is 11 different graphs. To
make sure that we get consistent results we all need to all use
something similar for the generation of our A matrix. I have
started using the following. It forces the iterative solver to
do several iterations, but has not yet failed to converge.<br>
</font></p>
<p><tt>do i = 1, DIM</tt><tt><br>
</tt><tt> vecx(i) = i</tt><tt><br>
</tt><tt> sumrow = 0.0</tt><tt><br>
</tt><tt> do j = 1, DIM</tt><tt><br>
</tt><tt> if ( i .ne. j) then</tt><tt><br>
</tt><tt> call random_number(ranval)</tt><tt><br>
</tt><tt> matrixa(j,i) = ranval</tt><tt><br>
</tt><tt> sumrow = sumrow + ranval</tt><tt><br>
</tt><tt> call random_number(ranval)</tt><tt><br>
</tt><tt> if ( ranval < 0.5 ) matrixa(j,i) = -matrixa(j,i)</tt><tt><br>
</tt><tt> endif</tt><tt><br>
</tt><tt> matrixa(i,i) = sumrow + 1.0</tt><tt><br>
</tt><tt> enddo</tt><tt><br>
</tt><tt>enddo</tt><font face="serif"><br>
<br>
</font></p>
<br>
<br>
<pre class="moz-signature" cols="72">--
Andrew J. Pounds, Ph.D. (<a class="moz-txt-link-abbreviated" href="mailto:pounds_aj@mercer.edu">pounds_aj@mercer.edu</a>)
Professor of Chemistry and Computer Science
Mercer University, Macon, GA 31207 (478) 301-5627
<a class="moz-txt-link-freetext" href="http://faculty.mercer.edu/pounds_aj">http://faculty.mercer.edu/pounds_aj</a>
</pre>
</body>
</html>