<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p><font face="serif">I am VERY CONCERNED about the lack of timing
tests that I have seen as well as the performance of the timing
tests that I have seen. Some of the tests that are running in
the queues never use beyond 2 threads on a node. So that you
all can have a calm week and not stress about the final I want
to break the Hybrid MPI project in two and have a portion of it
come in earlier in the week and then the second portion come in
for the actual final.</font></p>
<p><font face="serif">Following my theme that accuracy matters more
than speed, I want you to first verify that you are getting
ACCURATE RESULTS when you break up your matrix multiplication
problem across multiple computers and multiple threads. For
example -- what happens if I use an odd number of processors and
an odd number of threads, but a matrix dimension size that is
even -- do you still get accurate results. In this first paper
you have to PROVE that you are actually using the number or
processors, and number of threads per processor that you claim,
AND that you get accurate results for any combination of these.
I recommend doing SMALL tests here where you pick a large
matrix, but break it across 1 to 4 processors and then use 1 to
10 threads per processor. Don't go for "all the marbles" with
exhaustive tests until you are sure that your single jobs that
run in parallel are running as you expect. You can log into
one of the actual nodes that is being used to see if the code is
running in parallel like you think it should be working.
Checking for accuracy should be obvious -- the returned matrix
trace of the matrix should be the dimension of the matrix
(whether it is done serially on one node, done in parallel
across multiple threads on one node, done in parallel across
multiple nodes with one thread each, or done across multiple
nodes with each one using multiple threads).<br>
</font></p>
<p><font face="serif">Let's have those papers come in by TUESDAY
NIGHT at midnight. I will modify the CANVAS dropbox and
assignment description. <br>
</font></p>
<p><font face="serif">I will check those AS THEY COME IN. Once I
have given you the "okay", you can proceed to complete the final
exam portion.</font></p>
<p><font face="serif">The final exam portion will be essentially
completing what is already shown for the Hybrid MPI project
where you demonstrate what combination of processors and threads
gives you the maximum performance. I will modify the project
description and post the Final exam "assignment" in CANVAS. If
these are parallelized correctly then they should run fast --
and not take 60+ hours in the queues. <br>
</font></p>
<p><font face="serif">I know I am pushing a project into finals week
-- but based on what I was seeing there was going to be little
to no chance of you all finishing this by Sunday night with
correct results. By breaking it up and forcing your hand to
think about accuracy first, and then the levels of parallelism,
you all have a much higher chance of getting this done correctly
and finishing strong.<br>
</font></p>
<p><font face="serif">Stay safe out there... and play nice!<br>
</font></p>
<p>p.s. -- I know that you are going to hit some snags in this
coding -- and I'm pretty sure I know where those snags are. I'm
not going to give away the answers, but I will give you a hint and
tell you that if you want to have any chance of the fine-grained
threading to work properly, you've got to dump the array access
via the sequentially incremented bufIndex variable in the worker
process.</p>
<p>p.p.s. -- qstat -rn shows you the processors on which a specific
job number is running in PBS/Torque<br>
</p>
<pre class="moz-signature" cols="72">--
Andrew J. Pounds, Ph.D. (<a class="moz-txt-link-abbreviated" href="mailto:pounds_aj@mercer.edu">pounds_aj@mercer.edu</a>)
Professor of Chemistry and Computer Science
Director of the Computational Science Program
Mercer University, Macon, GA 31207 (478) 301-5627
</pre>
</body>
</html>