[CSC 435] What you should be seeing...

Andrew J. Pounds pounds_aj at mercer.edu
Tue Apr 16 11:33:00 EDT 2024


In class I had you run mpi from the command line across random nodes of 
your own choosing.  I wanted you to see what happened when you ran a 
calculation and potentially ran into others possibly sharing the node 
and or network.  Several of you got what seemed to be random results 
with the high number of nodes actually taking significantly longer.

By utilizing PBS/Torque "correctly", we are eliminating the chance of 
multiple people using the same node and also -- if the code is written 
correctly -- we will be minimizing the network traffick.  To help you 
understand this phenomena, I ran some test jobs last night on the 
cluster using the both the "allcopy" version and the improved mmm_mpi 
version of the code that only passes portions of the matrices.  I then 
graphed megaflops vs. number of processors.

plot


Notice what happens -- the allcopy version is faster when there are 
fewer than 6 nodes -- but it scales POORLY and after using six nodes the 
performance actually starts to decrease due to the increase in network 
traffick.  In contrast, the mpi_mmm code is still increasing in 
performance up to 10 nodes.  Recognize, this graph will vary based on 
the dimension of the matrix -- but it demonstrates a key component of 
distributed HPC computing -- processor power vs. network bandwidth and 
the necessity of thinking about this when designing HPC algorithms.


-- 
*Andrew J. Pounds, Ph.D.*
/Professor of Chemistry and Computer Science/
/Director of the Computational Science Program/
/Mercer University, Macon, GA, 31207 (478) 301-5627 /
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20240416/5929c1ca/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plot.png
Type: image/png
Size: 6015 bytes
Desc: not available
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20240416/5929c1ca/attachment.png>


More information about the csc435 mailing list