<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <font face="serif">Okay -- so progress was made today.&nbsp; Hopefully by

      the end of class on Thursday we all have CUDA code running and can

      work across multiple machines.&nbsp; <br>

      <br>

    </font><br>

    <font face="serif"><font face="serif">When I looked at the errors

        you all sent me last week and when I went through the torque

        system log files I was able to replicate the errors you were

        seeing -- and also determined the reason why none of the TORQUE

        output files were showing up in your directories. It was due to

        errors in file transmissions.&nbsp; Misconfigured authorization files

        can be the cause of this.&nbsp;&nbsp; </font></font><font face="serif"><font

        face="serif"><font face="serif">In a nutshell, while we can use

          the short names (e.g. -- csc100com01) to log into each of the

          machines remotely, TORQUE wants to see the fully qualified

          domain name (e.g. -- csc100com01.cs.mercer.edu).&nbsp; While you

          may have all of the short names in your authorized_keys file

          and in your known_hosts file, the known_hosts file needs to

          have BOTH the short and long version of the machine name (I

          may fix my generate keys file later to do both automatically

          later).&nbsp; <br>

          <br>

        </font>The other piece of the puzzle is that zeus sometimes has

        to rebooted (or the torque server restarted) to enable you to

        submit jobs from the client machines.&nbsp; This was not a problem

        with older versions of TORQUE, but when I upgraded it to handle

        GPU I think the patch "broke" some of the older pieces.&nbsp; Easy

        fix is to submit your jobs from zeus.<br>

        <br>

      </font><br>

      What I need you to do:<br>

      <br>

      1.&nbsp; Anyways you need to make sure that you can log into all of the

      machines using both the short name and the fully qualified domain

      name without using a password.&nbsp; If you cannot then we need to fix

      the keys for the machines where this is a problem.<br>

      <br>

      2.&nbsp; Verify, using two or three of the machines that you confirmed

      work in part 1, that you can run a batch job across them.&nbsp; You

      might want to use a TORQUE file like this....<br>

      <br>

    </font><tt>#!/bin/sh</tt><tt><br>

    </tt><tt>#PBS -N NODEMAP </tt><tt><br>

    </tt><tt>#PBS -m abe</tt><tt><br>

    </tt><tt>#PBS -M <a class="moz-txt-link-abbreviated" href="mailto:pounds_aj@mercer.edu">pounds_aj@mercer.edu</a></tt><tt><br>

    </tt><tt>#PBS -j oe</tt><tt><br>

    </tt><tt>#PBS -k n </tt><tt><br>

    </tt><tt>#PBS -l

      nodes=1:csc100com21:ppn=2+1:csc100com20:ppn=2,walltime=8:00:00</tt><tt><br>

    </tt><tt>#PBS -V</tt><tt><br>

    </tt><tt># </tt><tt><br>

    </tt><tt>setenv OMPI_MCA_btl self,tcp</tt><tt><br>

    </tt><tt>cat $PBS_NODEFILE</tt><tt><br>

    </tt><tt>cd /home/chemist/nodemapper </tt><tt><br>

    </tt><tt>n=`wc -l &lt; $PBS_NODEFILE`</tt><tt><br>

    </tt><tt>n=`expr $n / 2`</tt><tt><br>

    </tt><tt>mpirun -np $n --hostfile $PBS_NODEFILE --pernode nodemapper

    </tt><font face="serif"><br>

      <br>

      Notice that in this example I used csc100com20 and csc100com21

      (the two machines I was working at today).<br>

      <br>

      The only modifications you should have to make are the nodes, the

      directory, and the EMAIL.&nbsp; You should already have this in your

      nodemapper directory.&nbsp; Anyway -- give it a shot and let me know if

      it creates and output file in your directory with the correct

      output.<br>

      <br>

      Stay safe out there.&nbsp; Hopefully on Thursday we can get this all

      straightened out and also figure out what was going on with the

      CUDA card on Steve's computer today.<br>

      <br>

      <br>

    </font>

    <pre class="moz-signature" cols="72">-- 

Andrew J. Pounds, Ph.D.  (<a class="moz-txt-link-abbreviated" href="mailto:pounds_aj@mercer.edu">pounds_aj@mercer.edu</a>)

Professor of Chemistry and Computer Science

Mercer University,  Macon, GA 31207   (478) 301-5627

<a class="moz-txt-link-freetext" href="http://faculty.mercer.edu/pounds_aj">http://faculty.mercer.edu/pounds_aj</a>

</pre>

  </body>

</html>