[CSC 435] Hybrid MPI Project and Final - READ ME NOW!

Fri Apr 24 05:12:11 EDT 2020

I am VERY CONCERNED about the lack of timing tests that I have seen as
well as the performance of the timing tests that I have seen.  Some of
the tests that are running in the queues never use beyond 2 threads on a
node.   So that you all can have a calm week and not stress about the
final I want to break the Hybrid MPI project in two and have a portion
of it come in earlier in the week and then the second portion come in
for the actual final.

Following my theme that accuracy matters more than speed, I want you to
first verify that you are getting ACCURATE RESULTS when you break up
your matrix multiplication problem across multiple computers and
multiple threads.  For example -- what happens if I use an odd number of
processors and an odd number of threads, but a matrix dimension size
that is even -- do you still get accurate results.   In this first paper
you have to PROVE that you are actually using the number or processors,
and number of threads per processor that you claim, AND that you get
accurate results for any combination of these.  I recommend doing SMALL
tests here where you pick a large matrix, but break it across 1 to 4
processors and then use 1 to 10 threads per processor.   Don't go for
"all the marbles" with exhaustive tests until you are sure that your
single jobs that run in parallel are running as you expect.   You can
log into one of the actual nodes that is being used to see if the code
is running in parallel like you think it should be working.  Checking
for accuracy should be obvious -- the returned matrix trace of the
matrix should be the dimension of the matrix (whether it is done
serially on one node, done in parallel across multiple threads on one
node, done in parallel across multiple nodes with one thread each, or
done across multiple nodes with each one using multiple threads).

Let's have those papers come in by TUESDAY NIGHT at midnight.  I will
modify the CANVAS dropbox and assignment description. 

I will check those AS THEY COME IN.   Once I have given you the "okay",
you can proceed to complete the final exam portion.

The final exam portion will be essentially completing what is already
shown for the Hybrid MPI project where you demonstrate what combination
of processors and threads gives you the maximum performance.  I will
modify the project description and post the Final exam "assignment" in
CANVAS.  If these are parallelized correctly then they should run fast
-- and not take 60+ hours in the queues.  

I know I am pushing a project into finals week -- but based on what I
was seeing there was going to be little to no chance of you all
finishing this by Sunday night with correct results.  By breaking it up
and forcing your hand to think about accuracy first, and then the levels
of parallelism,  you all have a much higher chance of getting this done
correctly and finishing strong.

Stay safe out there... and play nice!

p.s. -- I know that you are going to hit some snags in this coding --
and I'm pretty sure I know where those snags are.  I'm not going to give
away the answers, but I will give you a hint and tell you that if you
want to have any chance of the fine-grained threading to work properly,
you've got to dump the array access via the sequentially incremented
bufIndex variable in the worker process.

p.p.s. -- qstat -rn shows you the processors on which a specific job
number is running in PBS/Torque

-- 
Andrew J. Pounds, Ph.D.  (pounds_aj at mercer.edu)
Professor of Chemistry and Computer Science
Director of the Computational Science Program
Mercer University,  Macon, GA 31207   (478) 301-5627

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20200424/dbf9964b/attachment.html>