A short Demo of TAU for use with the POINT Live DVD version 2, built 5/18/09.
open a terminal window cd workshop-point/NPB3.1 vi config/make.def
Notice that the MPIF77 and FLINK variable is set to tau_f90.sh. This will enable TAU's automatic instrumentation with PDT.
Setting the TAU makefile
close vi. setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt
This tells TAU to perform a basic instrumentation using PDT and the TAU MPI wrapper library. Now build the BT example program:
make bt CLASS=W NPROCS=16
Running the NPB example
cd bin mpirun -np 16 ./bt.W.16
TAU Profiles will automatically be generated in the current directory, one profile file per thread.
ls bt.W.16 profile.11.0.0 profile.15.0.0 profile.5.0.0 profile.9.0.0 profile.0.0.0 profile.12.0.0 profile.2.0.0 profile.6.0.0 profile.1.0.0 profile.13.0.0 profile.3.0.0 profile.7.0.0 profile.10.0.0 profile.14.0.0 profile.4.0.0 profile.8.0.0
Viewing TAU profiles
To get a simple summary of the TAU profiles type:
This gives you a basic idea of how much time was spent in different NPB routines.
Let's view this profile in TAU's ParaProf profile viewer
Paraprof will load the profile and show a single bar representing Node 0. Each colored subsection represents a different routine in NPB program. The length of a subsection is proportional to the exclusive time spent in that routine.
Right click on the "Node 0" label Select "Show Thread Bar Chart"
A new window will pop up ordering each routine by the amount of exclusive time.
Click "Windows" -> "Group Legend" Right click on "MPI" select "Hide This Group"
Now the MPI routines are excluded from all profile views.
close Group Legend window Click "Options" -> "Select Metric..." -> "Inclusive"
Now the bars are ordered by inclusive time.
Click on the "TAU: ParaProf Manager" window Right click on the trial name: "bin/NPB3.1/...." Select "Create Selective Instrumentation File"
A window will pop up showing a number of routines. These routine have been flagged by TAU as lightweight routines. Lightweight routines are defined as routines that have less than 10 microseconds per call and are called more than 100,000 times (these parameters can be changed--see the form above). Excluding these routines from instrumentation will help lower the instrumentation overhead.
Click "Save" Close ParaProf
Tell TAU to use this newly created selective instrumentation file:
setenv TAU_OPTIONS -optTauSelectFile=`pwd`/select.tau
Now rebuild the BT program.
cd .. make clean bt CLASS=W NPROCS=16
This time let's run the program with callpath profiling enabled:
setenv TAU_CALLPATH 1 setenv TAU_CALLPATH_DEPTH 10 cd bin mpirun -np 16 ./bt.W.16
paraprof Right click on 'Node 0' Select "Show Thread Call graph" close window Right click on 'Node 0' Select "Show Thread Statistics Table"
double click on "MPBT"
Here we can see that the MPBT routine calls INITIALIZE twice.
ParaProf is also very useful for comparing different runs of the same application. So far we have been compiling the NPB applications with a high level of optimization (-O3), can we quantify the performance benefit of doing so?
close ParaProf cd .. vi config/make.def Comment out line 50, "FFLAGS = -O3" write file and close vi
make clean bt CLASS=W NPROCS=16
Before running the experiment let's package the performance data we have already gathered:
cd bin paraprof --pack bt-O3.ppk mpirun -np 16 ./bt.W.16 paraprof --pack bt.ppk
Now open both profiles in ParaProf
paraprof *.ppk Click on 'TAU: ParaProf Manager' Right click on the 'bt.ppk' trial Select 'Add Mean to Comparison Window' Click on 'TAU: ParaProf Manager' Right click on the 'bt-O3.ppk' trial Select 'Add Mean to Comparison Window'
In the Comparison Window we can see a comparison between these two runs. Each bar shows the exclusive time for each routine. Some routine show little variation (MPI_Init) while some show a huge speedup when compiled with -O3 (X_BACKSUBSTITUTE).
TAU with PAPI
We need to change the TAU makefile to one built with PAPI
setenv TAU_MAKEFILE $TAU/Makefile.tau-papi-mpi-pdt cd .. make bt CLASS=W NPROCS=16 cd bin
We set PAPI counters using environment variables, find out which ones are available
Let's select FP instructions, total instructions and papi wall clock time:
setenv COUNTER1 PAPI_TOT_INS setenv COUNTER2 PAPI_FP_INS setenv COUNTER3 P_WALL_CLOCK_TIME setenv TAU_CALLPATH 0
mpirun -np 16 ./bt.W.16
Remove old profiles:
paraprof Click on "TAU: ParaProf Manager" window Click "Options" -> "Show Derived Metric Panel" click on "P_WALL_CLOCK_TIME" click on "PAPI_FP_INS"
select "Divide" click "Apply operation" Double click on "PAPI_FP_INS/P_WALL_CLOCK_TIME" Right click on "Mean", select "Show Mean Bar Chart"
Routines are now sorted my FP instructions per seconds.