<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>I had to kill a few jobs this everning because they were hung or
were running on henry. Remember -- don't run on henry -- just
submit jobs from there. As far as hung jobs there is a safety net
built in to PBS/Torque for this -- the amount of runtime.</p>
<p>In your PBS script you can set the maximum amount of walltime.</p>
<p>#PBS -l nodes=1:lab218:ppn=10<br>
#PBS -l walltime=2:00:00<br>
</p>
<p>of</p>
<p>#PBS -l nodes=1:lab218:ppn=10,walltime=2:00:00<br>
</p>
<p>Will set the amount of time your job is allowed to run to 2
hours. On these problems I can't imagine your benchmarking times
taking more than, say, 6 hours per test -- but some of you have
set your times to multiple days. Protect yourself and keep those
times short so if you do have a problem you are not waiting for
multiple days to discover the problem.</p>
<p>To give context, I ran 200 benchmarking OpenBLAS jobs in under 10
minutes.</p>
<p><br>
</p>
<div class="moz-signature">-- <br>
<b><em>Andrew J. Pounds, Ph.D.</em></b><br>
<em>Professor of Chemistry and Computer Science</em><br>
<em>Director of the Computational Science Program</em><br>
<em>Mercer University</em><br>
<em>1501 Mercer University Drive, Macon, GA 31207 </em><br>
<em>(478) 301-5627</em><br>
</div>
</body>
</html>