<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p><font face="serif">Guys -- before I delete the class mailing

        listserve for this semester, I wanted to thank you all for your

        hard work.  I think you all got a good "foundation" in high

        performance computing that will hopefully serve you well in the

        years to come.   I say foundation because there are so many

        topics that we did not get to cover.  Most of you will never

        utilize MPI again - but if you learn how to use it properly you

        can really unlock the power of a cluster.  If you ever get the

        chance to work in this type of cluster arena again I encourage

        you to do so and work on mastering the craft.<br>

      </font></p>

    <p><font face="serif">I didn't want to show you this before you got

        your hybrid benchmarking project done because I didn't want you

        focusing on trying to replicate my results -- but here are the

        performance results for  some of my MPI/OpenMP hybrid code

        running over the GSC 218 cluster.  In my compilations I

        explicitly specified processor type, as well as L2 and L3 cache

        sizes, provided additional information to MPI about how I wanted

        to schedule processors, and also pinned threads to processor

        sockets.  In the mmm_mpi function ALL loops were completely

        OpenMP parallelized (I removed any data dependencies of

        bufIndex++ and re-coded for thread-safety).  This sped up the

        time between the MPI sends and receives.  To ensure accuracy, I

        constructed a well conditioned matrix, computed its inverse,

        multiplied the matrix by its inverse, and summed up the

        diagonal.  The code worked perfectly for any dimension, cores,

        and threads. </font><font face="serif"><font face="serif">For

          the 8000x8000 case I got the following.  </font></font></p>

    <p><br>

    </p>

    <p><img moz-do-not-send="false"

        src="cid:part1.ou4cxU8j.w4NP7w4K@mercer.edu" alt="mflops"

        width="640" height="480"></p>

    <img moz-do-not-send="false"

      src="cid:part2.maw6ijDF.V3FCLv30@mercer.edu" alt="speedup"

      width="640" height="480">

    <p>I was able to run matrices up to 20000x20000 and get similar

      results.  At dimensions above 20000 I started to hit machine

      memory limits.  There are some interesting things happening at 16

      nodes that I want to investigate (I did not see those results when

      I ran this code before and I think they may be tied to new

      switching protocols in MPI-4 that I built a couple of months ago

      for the class - I did however, verify that the timings were

      correct).  What most of you saw in your results were  maxima

      between 2 and 6 nodes and 15 to 18 procs per node with less than

      20 GFlops maximum performance and speedups less than 50.  Some of

      you saw significantly smaller performance numbers.<br>

    </p>

    <p>I hope that the plots above convince you that there are numerous

      ways to improve the scalability and performance of the codes you

      wrote.  Combing MPI with OpenMP adds complexity that most students

      never have an opportunity to see.  On top of that, new versions of

      MPI come out often (I just got a notification from the developers

      that they are ready for me to evaluate MPI 5 release candidate

      7).  What you need to remember is that HPC is a topic that is

      dynamic and ever-changing -- we scratched the surface -- I

      encourage you to NEVER STOP LEARNING!  I've been a developer in

      the field for almost 30 years - and I'm still learning new stuff!<br>

    </p>

    <p>Again, thank you all for a great semester.<br>

    </p>

    <p>All the best!<br>

    </p>

    <div class="moz-signature">-- <br>

      <b><i>Andrew J. Pounds, Ph.D.</i></b><br>

      <i>Professor of Chemistry and Computer Science</i><br>

      <i>Director of the Computational Science Program</i><br>

      <i>Mercer University, Macon, GA 31207 (478) 301-5627</i></div>

  </body>

</html>