[CSC 435] Debugging MPI code...

Andrew J. Pounds pounds_aj at mercer.edu
Sat Apr 12 11:18:17 EDT 2014


Well apparently J.T. is the only one that found the small bug that I 
introduced into the MPI matrix multiplication code.  I was really 
confused in class on Thursday because apparently Steve and Tanner were 
getting the correct results, but J.T. was getting the error. If you got 
the code to run you should have noticed the following.

 1. The initial code (with the symmetric matrices) ran great on a
      * Single processor
      * Single node, multiple processors
      * Multiple nodes, single thread per node
 2. When you used the "accuracy check" matrices you should have found that
      * The code runs fine on a single processor
      * The code does not run correctly on a single node with multiple
        processes
      * The code does not run correctly across multiple nodes


The fact that you get a correct result on a single processor using 
symmetric matrices, and
a broken product when you use the non-symmetric matrices across multiple 
processors should
make you question the correctness of your WORKER PROCESS.

Now there are lots of causes for this.   There could be a problem 
transferring data to the process,
there could be memory issues or computational problems in the worker 
process, or there could even be
issues with retrieving the data and putting it back in the matrix on the 
master node.  However, since
the problem only occurs in the worker, then I would check the matrix 
multiplication code in the worker process.


See if you can spot the error, fix it, and start benchmarking. When I 
checked a few minutes ago all but one node in
lab 100 was up.  I'll try to swing by later today and get machine 2 up.


Let me know if you need help.

-- 
Andrew J. Pounds, Ph.D.  (pounds_aj at mercer.edu)
Professor of Chemistry and Computer Science
Mercer University,  Macon, GA 31207   (478) 301-5627
http://faculty.mercer.edu/pounds_aj

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20140412/ab3545db/attachment.html>


More information about the csc435 mailing list