[CSC 435] Getting the CUDA cards to run after failure

Andrew J. Pounds pounds_aj at mercer.edu
Sat Mar 22 09:31:54 EDT 2014


Gentlemen -- as a few of you had issues with the CUDA cards dying on 
your last week and then not letting you run subsequent codes on the 
cards, I took a few minutes Friday morning and came up with a solution 
that will at lease let your do subsequent calculations.

In this code I get the device error from the card (which should be no 
error, the devices are working fine they just can't allocate the memory) 
and then free up the device pointer(s), and then reset the card before 
terminating.


// Allocate memory on the card to store the matrices
     if ( (cudaStat = cudaMalloc(&d_A, DIM*DIM * sizeof(double))) != 
cudaSuccess ){
           errorstring = cudaGetErrorString(err);
           printf("Device memory allocation failed with err: %s.\n", 
errorstring);
           cudaFree(d_A);
           cudaDeviceReset();
           exit(1);
           }
     if ( (cudaStat = cudaMalloc(&d_B, DIM*DIM * sizeof(double))) != 
cudaSuccess ){
           errorstring = cudaGetErrorString(err);
           printf("Device memory allocation failed with err: %s.\n", 
errorstring);
           cudaFree(d_A);
           cudaFree(d_B);
           cudaDeviceReset();
           exit(1);
           }
     if ( (cudaStat = cudaMalloc(&d_C, DIM*DIM * sizeof(double))) != 
cudaSuccess ){
           errorstring = cudaGetErrorString(err);
           printf("Device memory allocation failed with err: %s.\n", 
errorstring);
           cudaFree(d_A);
           cudaFree(d_B);
           cudaFree(d_C);
           cudaDeviceReset();
           exit(1);
           }

-- 
Andrew J. Pounds, Ph.D.  (pounds_aj at mercer.edu)
Professor of Chemistry and Computer Science
Mercer University,  Macon, GA 31207   (478) 301-5627
http://faculty.mercer.edu/pounds_aj

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20140322/65ab17e1/attachment.html>


More information about the csc435 mailing list