From Tau Wiki
Jump to: navigation, search

Matrix Multiply

TAU v 2.25.1 has support for the OpenACC directives available in PGI 12.3 and greater. TAU provides instrumentation at the PGI runtime library layer with detailed source information. This simple matrix multiply application written with OpenACC annotations was compiled with the PGI -ta=nvidia flag to generate the executable. To use TAU to profile this application, you may:

Configure TAU:

./configure -c++=pgCC -cc=pgcc -fortran=pgi
make install
export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-pgi




tau_exec -T pgi -openacc ./mm

Use TAU's analysis tools to view the performance data:


Openacc profile1.png

Here we see the time spent in the PGI runtime library routines. The download time for variable a in the source code dominates the execution. We can see the nature of each operation in parenthesis.

Openacc profile2.png

Next, this data is presented in ParaProf's thread statistics window.

Openacc profile3.png

The driver code.

Openacc profile4.png

By clicking on a runtime layer routine, we can see the function in the application where the kernel was invoked along with the associated variable, source line number as well as the size of the array. By right clicking and choosing the 'Show Source Code' window, we can see the source line where this transfer takes place. For the downloadxx_multiply_matrices routine with the variable 'a', the time is attributed on the host at the source location shown below. It represents the transfer time and the time spent waiting on the host for results to be returned from the GPU.

Openacc profile5.png

OpenACC example source code

Matrix Multiply using the OpenACC directives and the Makefile to run with TAU.




File:Mm openacc.ppk