Difference between revisions of "Guide:TAUGPU"
(→Configure TAU with:) |
(→Configure TAU with:) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
./configure -opencl=<opencl headaers/libaries> -bfd=download | ./configure -opencl=<opencl headaers/libaries> -bfd=download | ||
+ | |||
+ | (along with any other options you would normally give to TAU.) | ||
Then: | Then: | ||
Line 13: | Line 15: | ||
Add '''<arch>/bin''' to your path and add '''<arch>/lib''' to your LD_LIBRARY_PATH. | Add '''<arch>/bin''' to your path and add '''<arch>/lib''' to your LD_LIBRARY_PATH. | ||
− | Now to collect performance data run your application with '''tau_exec''' | + | Now to collect performance data run your application with '''tau_exec''' giving either the option '-cupti' (for CUDA applications) or '-opencl' for OpenCL applications. |
tau_exec -T serial,cupti <-cupti|-opencl> ./a.out | tau_exec -T serial,cupti <-cupti|-opencl> ./a.out | ||
+ | |||
+ | MPI applications can be run like this: | ||
+ | |||
+ | mpirun -np 4 tau_exec -T mpi,cupti <-cupti|-opencl> ./a.out | ||
(For CUDA version < 4.1 use '''-cuda''' instead of '''-cupti'''.) | (For CUDA version < 4.1 use '''-cuda''' instead of '''-cupti'''.) | ||
Line 28: | Line 34: | ||
tau_multimerge | tau_multimerge | ||
− | + | tau2slog2 tau.trc tau.edf -o tau.slog2 | |
== Viewing data == | == Viewing data == | ||
Line 38: | Line 44: | ||
To view slog2 traces type: | To view slog2 traces type: | ||
− | jumpshot | + | jumpshot tau.slog2 |
== CUPTI Counters == | == CUPTI Counters == | ||
Line 55: | Line 61: | ||
== PGI OpenACC compiler == | == PGI OpenACC compiler == | ||
− | PGI uses the driver API to | + | PGI uses the driver API to generate CUDA code for its accelerated regions so you need to set: |
export TAU_CUPTI_API=driver | export TAU_CUPTI_API=driver | ||
before running a PGI OpenACC application. | before running a PGI OpenACC application. |
Latest revision as of 16:17, 22 April 2013
Configure TAU with:
./configure -cuda=<path to cuda toolkit> -bfd=download
or
./configure -opencl=<opencl headaers/libaries> -bfd=download
(along with any other options you would normally give to TAU.)
Then:
make install
Add <arch>/bin to your path and add <arch>/lib to your LD_LIBRARY_PATH.
Now to collect performance data run your application with tau_exec giving either the option '-cupti' (for CUDA applications) or '-opencl' for OpenCL applications.
tau_exec -T serial,cupti <-cupti|-opencl> ./a.out
MPI applications can be run like this:
mpirun -np 4 tau_exec -T mpi,cupti <-cupti|-opencl> ./a.out
(For CUDA version < 4.1 use -cuda instead of -cupti.)
For traces type:
export TAU_TRACE=1
before the tau_exec command.
And post-process the trace files by doing:
tau_multimerge tau2slog2 tau.trc tau.edf -o tau.slog2
Viewing data
To view profiles type:
paraprof
To view slog2 traces type:
jumpshot tau.slog2
CUPTI Counters
The CUPTI counters available for a given machine can assessed by typing:
tau_cupti_avail
Set the counters you wish to collect by exporting them as a colon separated list to the TAU_METRICS variable. ex:
export TAU_METRICS=CUDA.GeForce_GT_240.domain_b.instructions
Then run the application with tau_exec.
PGI OpenACC compiler
PGI uses the driver API to generate CUDA code for its accelerated regions so you need to set:
export TAU_CUPTI_API=driver
before running a PGI OpenACC application.