[CSC 435] What you should be seeing...
Andrew J. Pounds
pounds_aj at mercer.edu
Tue Apr 16 11:33:00 EDT 2024
In class I had you run mpi from the command line across random nodes of
your own choosing. I wanted you to see what happened when you ran a
calculation and potentially ran into others possibly sharing the node
and or network. Several of you got what seemed to be random results
with the high number of nodes actually taking significantly longer.
By utilizing PBS/Torque "correctly", we are eliminating the chance of
multiple people using the same node and also -- if the code is written
correctly -- we will be minimizing the network traffick. To help you
understand this phenomena, I ran some test jobs last night on the
cluster using the both the "allcopy" version and the improved mmm_mpi
version of the code that only passes portions of the matrices. I then
graphed megaflops vs. number of processors.
plot
Notice what happens -- the allcopy version is faster when there are
fewer than 6 nodes -- but it scales POORLY and after using six nodes the
performance actually starts to decrease due to the increase in network
traffick. In contrast, the mpi_mmm code is still increasing in
performance up to 10 nodes. Recognize, this graph will vary based on
the dimension of the matrix -- but it demonstrates a key component of
distributed HPC computing -- processor power vs. network bandwidth and
the necessity of thinking about this when designing HPC algorithms.
--
*Andrew J. Pounds, Ph.D.*
/Professor of Chemistry and Computer Science/
/Director of the Computational Science Program/
/Mercer University, Macon, GA, 31207 (478) 301-5627 /
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20240416/5929c1ca/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plot.png
Type: image/png
Size: 6015 bytes
Desc: not available
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20240416/5929c1ca/attachment.png>
More information about the csc435
mailing list