Difference between revisions of "Guide:TAUGPU"

From Tau Wiki
Jump to: navigation, search
(Configure TAU with:)
(Configure TAU with:)
 
(2 intermediate revisions by the same user not shown)
Line 6: Line 6:
  
 
  ./configure -opencl=<opencl headaers/libaries> -bfd=download
 
  ./configure -opencl=<opencl headaers/libaries> -bfd=download
 +
 +
(along with any other options you would normally give to TAU.)
  
 
Then:
 
Then:
Line 13: Line 15:
 
Add '''<arch>/bin''' to your path and add '''<arch>/lib''' to your LD_LIBRARY_PATH.
 
Add '''<arch>/bin''' to your path and add '''<arch>/lib''' to your LD_LIBRARY_PATH.
  
Now to collect performance data run your application with '''tau_exec''':
+
Now to collect performance data run your application with '''tau_exec''' giving either the option '-cupti' (for CUDA applications) or '-opencl' for OpenCL applications.
  
 
  tau_exec -T serial,cupti <-cupti|-opencl> ./a.out
 
  tau_exec -T serial,cupti <-cupti|-opencl> ./a.out
 +
 +
MPI applications can be run like this:
 +
 +
mpirun -np 4 tau_exec -T mpi,cupti <-cupti|-opencl> ./a.out
  
 
(For CUDA version < 4.1 use '''-cuda''' instead of '''-cupti'''.)
 
(For CUDA version < 4.1 use '''-cuda''' instead of '''-cupti'''.)
Line 38: Line 44:
 
To view slog2 traces type:
 
To view slog2 traces type:
  
  jumpshot
+
  jumpshot tau.slog2
  
 
== CUPTI Counters ==
 
== CUPTI Counters ==

Latest revision as of 16:17, 22 April 2013

Configure TAU with:

./configure -cuda=<path to cuda toolkit> -bfd=download

or

./configure -opencl=<opencl headaers/libaries> -bfd=download

(along with any other options you would normally give to TAU.)

Then:

make install

Add <arch>/bin to your path and add <arch>/lib to your LD_LIBRARY_PATH.

Now to collect performance data run your application with tau_exec giving either the option '-cupti' (for CUDA applications) or '-opencl' for OpenCL applications.

tau_exec -T serial,cupti <-cupti|-opencl> ./a.out

MPI applications can be run like this:

mpirun -np 4 tau_exec -T mpi,cupti <-cupti|-opencl> ./a.out

(For CUDA version < 4.1 use -cuda instead of -cupti.)

For traces type:

export TAU_TRACE=1

before the tau_exec command.

And post-process the trace files by doing:

tau_multimerge
tau2slog2 tau.trc tau.edf -o tau.slog2

Viewing data

To view profiles type:

paraprof

To view slog2 traces type:

jumpshot tau.slog2

CUPTI Counters

The CUPTI counters available for a given machine can assessed by typing:

tau_cupti_avail

Set the counters you wish to collect by exporting them as a colon separated list to the TAU_METRICS variable. ex:

export TAU_METRICS=CUDA.GeForce_GT_240.domain_b.instructions

Then run the application with tau_exec.

PGI OpenACC compiler

PGI uses the driver API to generate CUDA code for its accelerated regions so you need to set:

export TAU_CUPTI_API=driver

before running a PGI OpenACC application.