[CSC 435] Getting the CUDA cards to run after failure
Andrew J. Pounds
pounds_aj at mercer.edu
Sat Mar 22 09:31:54 EDT 2014
Gentlemen -- as a few of you had issues with the CUDA cards dying on
your last week and then not letting you run subsequent codes on the
cards, I took a few minutes Friday morning and came up with a solution
that will at lease let your do subsequent calculations.
In this code I get the device error from the card (which should be no
error, the devices are working fine they just can't allocate the memory)
and then free up the device pointer(s), and then reset the card before
terminating.
// Allocate memory on the card to store the matrices
if ( (cudaStat = cudaMalloc(&d_A, DIM*DIM * sizeof(double))) !=
cudaSuccess ){
errorstring = cudaGetErrorString(err);
printf("Device memory allocation failed with err: %s.\n",
errorstring);
cudaFree(d_A);
cudaDeviceReset();
exit(1);
}
if ( (cudaStat = cudaMalloc(&d_B, DIM*DIM * sizeof(double))) !=
cudaSuccess ){
errorstring = cudaGetErrorString(err);
printf("Device memory allocation failed with err: %s.\n",
errorstring);
cudaFree(d_A);
cudaFree(d_B);
cudaDeviceReset();
exit(1);
}
if ( (cudaStat = cudaMalloc(&d_C, DIM*DIM * sizeof(double))) !=
cudaSuccess ){
errorstring = cudaGetErrorString(err);
printf("Device memory allocation failed with err: %s.\n",
errorstring);
cudaFree(d_A);
cudaFree(d_B);
cudaFree(d_C);
cudaDeviceReset();
exit(1);
}
--
Andrew J. Pounds, Ph.D. (pounds_aj at mercer.edu)
Professor of Chemistry and Computer Science
Mercer University, Macon, GA 31207 (478) 301-5627
http://faculty.mercer.edu/pounds_aj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://theochem.mercer.edu/pipermail/csc435/attachments/20140322/65ab17e1/attachment.html>
More information about the csc435
mailing list