Difference between revisions of "Keeneland"

From Tau Wiki
Jump to: navigation, search
Line 9: Line 9:
 
     %> export PATH=/nics/c/home/biersdor/tau2/x86_64/bin/:$PATH
 
     %> export PATH=/nics/c/home/biersdor/tau2/x86_64/bin/:$PATH
 
     %> export LD_LIBRARY_PATH=/nics/c/home/biersdor/tau2/x86_64/lib/:$LD_LIBRARY_PATH
 
     %> export LD_LIBRARY_PATH=/nics/c/home/biersdor/tau2/x86_64/lib/:$LD_LIBRARY_PATH
 +
 +
== Compiling with ==
 +
 +
Details at:
 +
 +
But the idea is replace your compiler with the TAU wrapper compilers, ie '''gcc''' => '''tau_cc.sh''', gfortran => '''tau_f90.sh''' etc...
 +
 +
== Running with tau_exec ==
 +
 +
For a quick
  
  
Line 23: Line 33:
 
=== Trouble-shooting ===
 
=== Trouble-shooting ===
 
   
 
   
* CPU side looks fine but o GPU profile/trace generated.
+
* CPU side looks fine but no GPU profile/trace generated.
  
This is likely because there is no '''cudaThreadExit()''' call at the end the application. By placing one there this will signal TAU that the applications CUDA accelerated section is finished and it can go ahead and wright out the profile/trace.  
+
This is likely because there is no '''cudaThreadExit()''' call at the end the application. By placing one there this will signal TAU that the applications CUDA accelerated section is finished and it can go ahead and write out the profile/trace.  
  
 
Fix: Place '''cudaThreadExit()''' at the end of the application.
 
Fix: Place '''cudaThreadExit()''' at the end of the application.
Line 31: Line 41:
 
* Receiving '''Error calculating kernel event [start|stop], error #: 33.''' during execution.
 
* Receiving '''Error calculating kernel event [start|stop], error #: 33.''' during execution.
  
This means that CUDA could not retrieve the event object at synchronization. Try placing the synchronize event right after the kernel is launched. In some cases no configuration of kernel launches/synchronization points will suffice, in this cases although this one kernel could not be tracked any other ones taking place in the application should be tracked correctly.
+
This means that CUDA could not retrieve the event object at synchronization. Try placing the synchronize event right after the kernel is launched. In some cases no configuration of kernel launches/synchronization points will suffice, and although this one kernel could not be tracked any other ones taking place in the application should be tracked correctly.
  
 
Fix: Try placing a synchronization called right after the kernel launch.
 
Fix: Try placing a synchronization called right after the kernel launch.
Line 38: Line 48:
  
 
== Running OpenCL applications ==
 
== Running OpenCL applications ==
 +
 +
Use
  
 
== CUpti and PAPI ==
 
== CUpti and PAPI ==
  
 
Coming soon...
 
Coming soon...

Revision as of 19:24, 25 January 2011

Guide for using TAU on Keeneland

Setting up environment

We'll have a module for TAU setup shortly but for now, setup your environment this way:


   %> export PATH=/nics/c/home/biersdor/tau2/x86_64/bin/:$PATH
   %> export LD_LIBRARY_PATH=/nics/c/home/biersdor/tau2/x86_64/lib/:$LD_LIBRARY_PATH

Compiling with

Details at:

But the idea is replace your compiler with the TAU wrapper compilers, ie gcc => tau_cc.sh, gfortran => tau_f90.sh etc...

Running with tau_exec

For a quick


Running CUDA applications

Both CUDA and OpenCL are instrumented dynamically through library preloading, use the tau_exec script to run the CUDA application:

   %> tau_exec -T serial -cuda ./Stencil2D

The -T serial specifies with TAU configuration to use, you can omit this for MPI applications and run:


Trouble-shooting

  • CPU side looks fine but no GPU profile/trace generated.

This is likely because there is no cudaThreadExit() call at the end the application. By placing one there this will signal TAU that the applications CUDA accelerated section is finished and it can go ahead and write out the profile/trace.

Fix: Place cudaThreadExit() at the end of the application.

  • Receiving Error calculating kernel event [start|stop], error #: 33. during execution.

This means that CUDA could not retrieve the event object at synchronization. Try placing the synchronize event right after the kernel is launched. In some cases no configuration of kernel launches/synchronization points will suffice, and although this one kernel could not be tracked any other ones taking place in the application should be tracked correctly.

Fix: Try placing a synchronization called right after the kernel launch.


Running OpenCL applications

Use

CUpti and PAPI

Coming soon...