Difference between revisions of "Keeneland"

From Tau Wiki
Jump to: navigation, search
(Traces)
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Guide for using TAU on Keeneland =
 
= Guide for using TAU on Keeneland =
 +
 +
== Slide about TAU ==
 +
 +
[http://nic.uoregon.edu/~scottb/tau-overview.pdf TAU overview slides]
 +
  
  
 
== Setting up environment ==
 
== Setting up environment ==
  
We'll have a module for TAU setup shortly but for now, setup your environment this way:  
+
setup your environment this way:  
 +
 
 +
    module load tau
 +
    export TAU_MAKEFILE=$TAUROOT/lib/Makefile.tau-cupti-pdt
 +
 
 +
== Compiling SHOC 1.0.1 with TAU ==
  
 +
After configuring SHOC edit the '''config/common.mk''' to:
  
    %> export PATH=/nics/c/home/biersdor/tau2/x86_64/bin/:$PATH
 
    %> export LD_LIBRARY_PATH=/nics/c/home/biersdor/tau2/x86_64/lib/:$LD_LIBRARY_PATH
 
  
== Compiling with ==
+
    # === Basics ===
 +
    <b>CC      = tau_cc.sh</b>
 +
    <b>CXX      = tau_cxx.sh</b>
 +
    <b>LD      = tau_cxx.sh</b>
 +
    AR      = /usr/bin/ar
 +
    RANLIB  = ranlib
 +
 
 +
    CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config
 +
    CFLAGS  += -m64 -g -O2
 +
    CXXFLAGS += -m64 -g -O2
 +
    ARFLAGS  = rcv
 +
    LDFLAGS  =
 +
    LIBS    = -L$(SHOC_ROOT)/lib  -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart
 +
 
 +
    USE_MPI        = no
 +
 
 +
    OCL_CPPFLAGS    += -I${SHOC_ROOT}/src/opencl/common
 +
    OCL_LIBS        =
 +
 
 +
    NVCC            = /sw/keeneland/cuda/3.2/bin/nvcc
 +
    CUDA_CXX        = tau_cxx.sh
 +
    CUDA_INC        = -I/sw/keeneland/cuda/3.2/include
 +
    CUDA_CPPFLAGS  += -gencode=arch=compute_10,code=sm_10 \
 +
    -gencode=arch=compute_11,code=sm_11  -gencode=arch=compute_13,code=sm_13 \
 +
    -gencode=arch=compute_20,code=sm_20  -gencode=arch=compute_20,code=compute_20 \
 +
    -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
 +
 
  
Details at:
+
Then make/install as you normally would.
  
But the idea is replace your compiler with the TAU wrapper compilers, ie '''gcc''' => '''tau_cc.sh''', gfortran => '''tau_f90.sh''' etc...
+
More info at: [http://www.cs.uoregon.edu/research/tau/docs/newguide/bk01ch01s02.html TAU's userguide]
  
== Running with tau_exec ==
+
== Building SHOC with VampirTrace ==
  
For a quick
+
In this case edit the '''config/common.mk''' to read:
  
 +
    # === Basics ===
 +
    <b>CC      = vtcc --vt:cc mpicc</b>
 +
    <b>CXX      = vtcxx --vt:cxx mpicxx</b>
 +
    <b>LD      = vtcxx --vt:cxx mpicxx</b>
 +
    AR      = /usr/bin/ar
 +
    RANLIB  = ranlib
 +
 
 +
    CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config
 +
    CFLAGS  += -m64 -g -O2
 +
    CXXFLAGS += -m64 -g -O2
 +
    ARFLAGS  = rcv
 +
    LDFLAGS  =
 +
    LIBS    = -L$(SHOC_ROOT)/lib  -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart
 +
 
 +
    USE_MPI        = no
 +
 
 +
    OCL_CPPFLAGS    += -I${SHOC_ROOT}/src/opencl/common
 +
    OCL_LIBS        =
 +
 
 +
    NVCC            = vtnvcc
 +
    CUDA_CXX        = vtnvcc
 +
    CUDA_INC        = -I/sw/keeneland/cuda/3.2/include
 +
    CUDA_CPPFLAGS  += -gencode=arch=compute_10,code=sm_10 \
 +
    -gencode=arch=compute_11,code=sm_11  -gencode=arch=compute_13,code=sm_13 \
 +
    -gencode=arch=compute_20,code=sm_20  -gencode=arch=compute_20,code=compute_20 \
 +
    -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
  
 
== Running CUDA applications ==
 
== Running CUDA applications ==
Line 25: Line 86:
 
Both CUDA and OpenCL are instrumented dynamically through library preloading, use the '''tau_exec''' script to run the CUDA application:
 
Both CUDA and OpenCL are instrumented dynamically through library preloading, use the '''tau_exec''' script to run the CUDA application:
  
     %> tau_exec -T serial -cuda ./Stencil2D
+
     %> tau_exec -T serial,cupti -cupti ./Stencil2D
 +
 
 +
The '''-T serial''' specifies with TAU configuration to use, you can change this for MPI applications and run:
 +
 
 +
    %> mpirun -np 4 tau_exec -T mpi,cupti -cupti ./SGEMM
  
The '''-T serial''' specifies with TAU configuration to use, you can omit this for MPI applications and run:
+
This could be done with executables build with or without TAU.
  
 +
=== Traces ===
  
 +
Traces can be recorded by first setting:
  
=== Trouble-shooting ===
+
    %> export TAU_TRACE=1
+
    %> tau_exec -T serial,cupti -cupti ./Stencil2D
* CPU side looks fine but no GPU profile/trace generated.
+
    %> tau_multimerge
 +
    %> tau2slog2 tau.trc tau.edf -o stencil2d.slog2
 +
    %> jumpshot stencil2d.slog2
  
This is likely because there is no '''cudaThreadExit()''' call at the end the application. By placing one there this will signal TAU that the applications CUDA accelerated section is finished and it can go ahead and write out the profile/trace.
+
== Running OpenCL applications ==
  
Fix: Place '''cudaThreadExit()''' at the end of the application.
+
Use '''tau_exec''' as well:
  
* Receiving '''Error calculating kernel event [start|stop], error #: 33.''' during execution.
+
    %> tau_exec -T serial -opencl ./SGEMM
  
This means that CUDA could not retrieve the event object at synchronization. Try placing the synchronize event right after the kernel is launched. In some cases no configuration of kernel launches/synchronization points will suffice, and although this one kernel could not be tracked any other ones taking place in the application should be tracked correctly.
+
<!--
 +
== CUpti ==
  
Fix: Try placing a synchronization called right after the kernel launch.
+
Using a configuration of TAU compiled with CUpti you can get performance metrics recorded from the GPU.
  
 +
First use '''tau_cupti_avail''' to see the available counters.
  
 +
Then choose a set of counters to record:
  
== Running OpenCL applications ==
+
  export TAU_METRICS=
 +
 
 +
Finally use the '''cupti''' option to <b>tau_exec</b> when running an application:
 +
 
 +
  tau_exec -T serial,cupti -cuda ./S3D
 +
-->
  
Use
+
== Performance Data ==
  
== CUpti and PAPI ==
+
Some example performance data from S3D:
  
Coming soon...
+
[[Image:S3D-cuda.ppk]] and [[Image:S3D-cuda.slog2]]

Latest revision as of 17:59, 25 August 2012

Guide for using TAU on Keeneland

Slide about TAU

TAU overview slides


Setting up environment

setup your environment this way:

   module load tau
   export TAU_MAKEFILE=$TAUROOT/lib/Makefile.tau-cupti-pdt

Compiling SHOC 1.0.1 with TAU

After configuring SHOC edit the config/common.mk to:


   # === Basics ===
   CC       = tau_cc.sh
   CXX      = tau_cxx.sh
   LD       = tau_cxx.sh
   AR       = /usr/bin/ar
   RANLIB   = ranlib
  
   CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config
   CFLAGS   += -m64 -g -O2
   CXXFLAGS += -m64 -g -O2
   ARFLAGS  = rcv
   LDFLAGS  =
   LIBS     = -L$(SHOC_ROOT)/lib  -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart
  
   USE_MPI         = no
  
   OCL_CPPFLAGS    += -I${SHOC_ROOT}/src/opencl/common
   OCL_LIBS        =
  
   NVCC            = /sw/keeneland/cuda/3.2/bin/nvcc
   CUDA_CXX        = tau_cxx.sh
   CUDA_INC        = -I/sw/keeneland/cuda/3.2/include
   CUDA_CPPFLAGS   += -gencode=arch=compute_10,code=sm_10 \
   -gencode=arch=compute_11,code=sm_11  -gencode=arch=compute_13,code=sm_13 \
   -gencode=arch=compute_20,code=sm_20  -gencode=arch=compute_20,code=compute_20 \
   -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
  

Then make/install as you normally would.

More info at: TAU's userguide

Building SHOC with VampirTrace

In this case edit the config/common.mk to read:

   # === Basics ===
   CC       = vtcc --vt:cc mpicc
   CXX      = vtcxx --vt:cxx mpicxx
   LD       = vtcxx --vt:cxx mpicxx
   AR       = /usr/bin/ar
   RANLIB   = ranlib
  
   CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config
   CFLAGS   += -m64 -g -O2
   CXXFLAGS += -m64 -g -O2
   ARFLAGS  = rcv
   LDFLAGS  =
   LIBS     = -L$(SHOC_ROOT)/lib  -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart
  
   USE_MPI         = no
  
   OCL_CPPFLAGS    += -I${SHOC_ROOT}/src/opencl/common
   OCL_LIBS        =
  
   NVCC            = vtnvcc
   CUDA_CXX        = vtnvcc
   CUDA_INC        = -I/sw/keeneland/cuda/3.2/include
   CUDA_CPPFLAGS   += -gencode=arch=compute_10,code=sm_10 \
   -gencode=arch=compute_11,code=sm_11  -gencode=arch=compute_13,code=sm_13 \
   -gencode=arch=compute_20,code=sm_20  -gencode=arch=compute_20,code=compute_20 \
   -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)

Running CUDA applications

Both CUDA and OpenCL are instrumented dynamically through library preloading, use the tau_exec script to run the CUDA application:

   %> tau_exec -T serial,cupti -cupti ./Stencil2D

The -T serial specifies with TAU configuration to use, you can change this for MPI applications and run:

   %> mpirun -np 4 tau_exec -T mpi,cupti -cupti ./SGEMM

This could be done with executables build with or without TAU.

Traces

Traces can be recorded by first setting:

   %> export TAU_TRACE=1
   %> tau_exec -T serial,cupti -cupti ./Stencil2D
   %> tau_multimerge
   %> tau2slog2 tau.trc tau.edf -o stencil2d.slog2
   %> jumpshot stencil2d.slog2

Running OpenCL applications

Use tau_exec as well:

   %> tau_exec -T serial -opencl ./SGEMM 


Performance Data

Some example performance data from S3D:

File:S3D-cuda.ppk and File:S3D-cuda.slog2