<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p><font face="serif">So I did not give you a dot.c function in the
initial repo. Here is a dot product serial code:</font></p>
<p><font face="serif"><br>
</font></p>
<p><tt>#ifdef __cplusplus</tt><tt><br>
</tt><tt>extern "C" {</tt><tt><br>
</tt><tt>#endif</tt><tt><br>
</tt><tt> double dot_( int *threads, int *len, double *vec1,
double *vec2);</tt><tt><br>
</tt><tt>#ifdef __cplusplus</tt><tt><br>
</tt><tt> }</tt><tt><br>
</tt><tt>#endif</tt><tt><br>
</tt><tt><br>
</tt><tt>/* S E R I A L C O D E */</tt><tt><br>
</tt><tt><br>
</tt><tt>double dot_( int *threads, int *len, double *vec1, double
*vec2) {</tt><tt><br>
</tt><tt><br>
</tt><tt> int k;</tt><tt><br>
</tt><tt> int veclen = *len;</tt><tt><br>
</tt><tt> int mod;</tt><tt><br>
</tt><tt> double product;</tt><tt><br>
</tt><tt><br>
</tt><tt> // Compute the dot product</tt><tt><br>
</tt><tt><br>
</tt><tt> product = 0.0;</tt><tt><br>
</tt><tt> for (k=0;k<veclen;k++){</tt><tt><br>
</tt><tt> product += *(vec1+k) * *(vec2+k);</tt><tt><br>
</tt><tt> }</tt><tt><br>
</tt><tt> return(product);</tt><tt><br>
</tt><tt>}</tt><br>
</p>
<p><br>
</p>
<p>If you want to test this independently of you other codes, then I
recommend that you make a second driver. I am including a
"dotdriver.f90" piece of code so you can see how to actually call
the dot product function in fortran (as it returns a value like
cputime and walltime).</p>
<p><tt>program dotdriver </tt><tt><br>
</tt><tt><br>
</tt><tt>integer :: NDIM</tt><tt><br>
</tt><tt><br>
</tt><tt>real (kind=8) :: wall_start, wall_end</tt><tt><br>
</tt><tt>real (kind=8) :: cpu_start, cpu_end</tt><tt><br>
</tt><tt>real (kind=8) :: trace</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt>integer :: startval, stopval, stepval</tt><tt><br>
</tt><tt>real (kind=8) :: walltime</tt><tt><br>
</tt><tt>real (kind=8) :: cputime </tt><tt><br>
</tt><tt>real (kind=8) :: dot </tt><tt><br>
</tt><tt>external walltime, cputime, dot</tt><tt><br>
</tt><tt><br>
</tt><tt>character (len=8) :: carg1, carg2, carg3</tt><tt><br>
</tt><tt><br>
</tt><tt>real (kind=8), dimension(:), allocatable :: veca, vecb</tt><tt><br>
</tt><tt>real (kind=8), dimension(:,:), allocatable :: matrixa,
matrixb, matrixc</tt><tt><br>
</tt><tt><br>
</tt><tt>!modified to use command line arguments</tt><tt><br>
</tt><tt><br>
</tt><tt>call get_command_argument(1, carg1)</tt><tt><br>
</tt><tt>call get_command_argument(2, carg2)</tt><tt><br>
</tt><tt>call get_command_argument(3, carg3)</tt><tt><br>
</tt><tt><br>
</tt><tt>! Use Fortran internal files to convert command line
arguments to ints</tt><tt><br>
</tt><tt><br>
</tt><tt>read (carg1,'(i8)') startval</tt><tt><br>
</tt><tt>read (carg2,'(i8)') stopval</tt><tt><br>
</tt><tt>read (carg3,'(i8)') stepval</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt>do iter = startval, stopval, stepval</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt><br>
</tt><tt>NDIM = iter</tt><tt><br>
</tt><tt><br>
</tt><tt>allocate ( veca(NDIM), stat=ierr)</tt><tt><br>
</tt><tt>allocate ( vecb(NDIM), stat=ierr)</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt>do i = 1, NDIM </tt><tt><br>
</tt><tt> veca(i) = sqrt(dble(NDIM)) </tt><tt><br>
</tt><tt> vecb(i) = 1.0 / sqrt( dble(NDIM))</tt><tt><br>
</tt><tt>enddo</tt><tt><br>
</tt><tt><br>
</tt><tt>wall_start = walltime()</tt><tt><br>
</tt><tt>cpu_start = cputime()</tt><tt><br>
</tt><tt><br>
</tt><tt>trace = dot(1, NDIM, veca, vecb)</tt><tt><br>
</tt><tt><br>
</tt><tt>cpu_end = cputime()</tt><tt><br>
</tt><tt>wall_end = walltime()</tt><tt><br>
</tt><tt><br>
</tt><tt>mflops = 2*dble(NDIM)/ (cpu_end-cpu_start) / 1.0e6</tt><tt><br>
</tt><tt>mflops2 = 2*dble(NDIM)/ (wall_end-wall_start)/ 1.0e6</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt>print *, NDIM, trace, cpu_end-cpu_start,
wall_end-wall_start, mflops, mflops2</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt>deallocate(veca)</tt><tt><br>
</tt><tt>deallocate(vecb)</tt><tt><br>
</tt><tt><br>
</tt><tt>enddo</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt>end program dotdriver </tt><tt><br>
</tt><tt> </tt><br>
</p>
<p>The dot product is SO FAST that it will be difficult to get
reasonable results unless you use extremely long vectors. I ran
on hammer with vector lengths of 1 billion and it completed in
milliseconds. This led to performance in the gigaflops range,
which is what we expect.</p>
<p>Be careful just going to extremely large integers for you
dimensions -- remember, these are discrete machines and there is a
limit on the size of the integer values. Normal ints, according
to /usr/include/limits.h, have a maximum of 2147483647.</p>
<p>You can go larger -- but you will have to change the integer
variable types.</p>
<p><br>
</p>
<p>As always, let me know if you have any questions.</p>
<p><br>
</p>
<p><br>
</p>
<pre class="moz-signature" cols="72">--
Andrew J. Pounds, Ph.D. (<a class="moz-txt-link-abbreviated" href="mailto:pounds_aj@mercer.edu">pounds_aj@mercer.edu</a>)
Professor of Chemistry and Computer Science
Director of the Computational Science Program
Mercer University, Macon, GA 31207 (478) 301-5627
</pre>
</body>
</html>