Difference between revisions of "Keeneland"

From Tau Wiki
Jump to: navigation, search
 
(Traces)
 
(25 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Guide for using TAU on Keeneland =
 
= Guide for using TAU on Keeneland =
 +
 +
== Slide about TAU ==
 +
 +
[http://nic.uoregon.edu/~scottb/tau-overview.pdf TAU overview slides]
 +
  
  
 
== Setting up environment ==
 
== Setting up environment ==
  
We'll have a module for TAU setup short but for now, setup your environment this way:  
+
setup your environment this way:  
 +
 
 +
    module load tau
 +
    export TAU_MAKEFILE=$TAUROOT/lib/Makefile.tau-cupti-pdt
 +
 
 +
== Compiling SHOC 1.0.1 with TAU ==
 +
 
 +
After configuring SHOC edit the '''config/common.mk''' to:
 +
 
 +
 
 +
    # === Basics ===
 +
    <b>CC      = tau_cc.sh</b>
 +
    <b>CXX      = tau_cxx.sh</b>
 +
    <b>LD      = tau_cxx.sh</b>
 +
    AR      = /usr/bin/ar
 +
    RANLIB  = ranlib
 +
 
 +
    CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config
 +
    CFLAGS  += -m64 -g -O2
 +
    CXXFLAGS += -m64 -g -O2
 +
    ARFLAGS  = rcv
 +
    LDFLAGS  =
 +
    LIBS    = -L$(SHOC_ROOT)/lib  -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart
 +
 
 +
    USE_MPI        = no
 +
 
 +
    OCL_CPPFLAGS    += -I${SHOC_ROOT}/src/opencl/common
 +
    OCL_LIBS        =
 +
 
 +
    NVCC            = /sw/keeneland/cuda/3.2/bin/nvcc
 +
    CUDA_CXX        = tau_cxx.sh
 +
    CUDA_INC        = -I/sw/keeneland/cuda/3.2/include
 +
    CUDA_CPPFLAGS  += -gencode=arch=compute_10,code=sm_10 \
 +
    -gencode=arch=compute_11,code=sm_11  -gencode=arch=compute_13,code=sm_13 \
 +
    -gencode=arch=compute_20,code=sm_20  -gencode=arch=compute_20,code=compute_20 \
 +
    -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
 +
 
 +
 
 +
Then make/install as you normally would.
 +
 
 +
More info at: [http://www.cs.uoregon.edu/research/tau/docs/newguide/bk01ch01s02.html TAU's userguide]
  
 +
== Building SHOC with VampirTrace ==
  
    %> export PATH=/nics/c/home/biersdor/tau2/x86_64/bin/:$PATH
+
In this case edit the '''config/common.mk''' to read:
    %> export LD_LIBRARY_PATH=/nics/c/home/biersdor/tau2/x86_64/lib/:$LD_LIBRARY_PATH
 
  
 +
    # === Basics ===
 +
    <b>CC      = vtcc --vt:cc mpicc</b>
 +
    <b>CXX      = vtcxx --vt:cxx mpicxx</b>
 +
    <b>LD      = vtcxx --vt:cxx mpicxx</b>
 +
    AR      = /usr/bin/ar
 +
    RANLIB  = ranlib
 +
 
 +
    CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config
 +
    CFLAGS  += -m64 -g -O2
 +
    CXXFLAGS += -m64 -g -O2
 +
    ARFLAGS  = rcv
 +
    LDFLAGS  =
 +
    LIBS    = -L$(SHOC_ROOT)/lib  -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart
 +
 
 +
    USE_MPI        = no
 +
 
 +
    OCL_CPPFLAGS    += -I${SHOC_ROOT}/src/opencl/common
 +
    OCL_LIBS        =
 +
 
 +
    NVCC            = vtnvcc
 +
    CUDA_CXX        = vtnvcc
 +
    CUDA_INC        = -I/sw/keeneland/cuda/3.2/include
 +
    CUDA_CPPFLAGS  += -gencode=arch=compute_10,code=sm_10 \
 +
    -gencode=arch=compute_11,code=sm_11  -gencode=arch=compute_13,code=sm_13 \
 +
    -gencode=arch=compute_20,code=sm_20  -gencode=arch=compute_20,code=compute_20 \
 +
    -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
  
 
== Running CUDA applications ==
 
== Running CUDA applications ==
Line 15: Line 86:
 
Both CUDA and OpenCL are instrumented dynamically through library preloading, use the '''tau_exec''' script to run the CUDA application:
 
Both CUDA and OpenCL are instrumented dynamically through library preloading, use the '''tau_exec''' script to run the CUDA application:
  
     %> tau_exec -T serial -cuda ./Stencil2D
+
     %> tau_exec -T serial,cupti -cupti ./Stencil2D
  
The '''-T serial''' specifies with TAU configuration to use, you can omit this for MPI applications and run:  
+
The '''-T serial''' specifies with TAU configuration to use, you can change this for MPI applications and run:  
 +
 
 +
    %> mpirun -np 4 tau_exec -T mpi,cupti -cupti ./SGEMM
  
 +
This could be done with executables build with or without TAU.
  
 +
=== Traces ===
  
=== Trouble-shooting ===
+
Traces can be recorded by first setting:
 
* CPU side looks fine but o GPU profile/trace generated.
 
  
 +
    %> export TAU_TRACE=1
 +
    %> tau_exec -T serial,cupti -cupti ./Stencil2D
 +
    %> tau_multimerge
 +
    %> tau2slog2 tau.trc tau.edf -o stencil2d.slog2
 +
    %> jumpshot stencil2d.slog2
  
 
== Running OpenCL applications ==
 
== Running OpenCL applications ==
  
== CUpti and PAPI ==
+
Use '''tau_exec''' as well:
 +
 
 +
    %> tau_exec -T serial -opencl ./SGEMM
 +
 
 +
<!--
 +
== CUpti ==
 +
 
 +
Using a configuration of TAU compiled with CUpti you can get performance metrics recorded from the GPU.
 +
 
 +
First use '''tau_cupti_avail''' to see the available counters.
 +
 
 +
Then choose a set of counters to record:
 +
 
 +
  export TAU_METRICS=
 +
 
 +
Finally use the '''cupti''' option to <b>tau_exec</b> when running an application:
 +
 
 +
  tau_exec -T serial,cupti -cuda ./S3D
 +
-->
 +
 
 +
== Performance Data ==
 +
 
 +
Some example performance data from S3D:
  
Coming soon...
+
[[Image:S3D-cuda.ppk]] and [[Image:S3D-cuda.slog2]]

Latest revision as of 17:59, 25 August 2012

Guide for using TAU on Keeneland

Slide about TAU

TAU overview slides


Setting up environment

setup your environment this way:

   module load tau
   export TAU_MAKEFILE=$TAUROOT/lib/Makefile.tau-cupti-pdt

Compiling SHOC 1.0.1 with TAU

After configuring SHOC edit the config/common.mk to:


   # === Basics ===
   CC       = tau_cc.sh
   CXX      = tau_cxx.sh
   LD       = tau_cxx.sh
   AR       = /usr/bin/ar
   RANLIB   = ranlib
  
   CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config
   CFLAGS   += -m64 -g -O2
   CXXFLAGS += -m64 -g -O2
   ARFLAGS  = rcv
   LDFLAGS  =
   LIBS     = -L$(SHOC_ROOT)/lib  -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart
  
   USE_MPI         = no
  
   OCL_CPPFLAGS    += -I${SHOC_ROOT}/src/opencl/common
   OCL_LIBS        =
  
   NVCC            = /sw/keeneland/cuda/3.2/bin/nvcc
   CUDA_CXX        = tau_cxx.sh
   CUDA_INC        = -I/sw/keeneland/cuda/3.2/include
   CUDA_CPPFLAGS   += -gencode=arch=compute_10,code=sm_10 \
   -gencode=arch=compute_11,code=sm_11  -gencode=arch=compute_13,code=sm_13 \
   -gencode=arch=compute_20,code=sm_20  -gencode=arch=compute_20,code=compute_20 \
   -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
  

Then make/install as you normally would.

More info at: TAU's userguide

Building SHOC with VampirTrace

In this case edit the config/common.mk to read:

   # === Basics ===
   CC       = vtcc --vt:cc mpicc
   CXX      = vtcxx --vt:cxx mpicxx
   LD       = vtcxx --vt:cxx mpicxx
   AR       = /usr/bin/ar
   RANLIB   = ranlib
  
   CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config
   CFLAGS   += -m64 -g -O2
   CXXFLAGS += -m64 -g -O2
   ARFLAGS  = rcv
   LDFLAGS  =
   LIBS     = -L$(SHOC_ROOT)/lib  -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart
  
   USE_MPI         = no
  
   OCL_CPPFLAGS    += -I${SHOC_ROOT}/src/opencl/common
   OCL_LIBS        =
  
   NVCC            = vtnvcc
   CUDA_CXX        = vtnvcc
   CUDA_INC        = -I/sw/keeneland/cuda/3.2/include
   CUDA_CPPFLAGS   += -gencode=arch=compute_10,code=sm_10 \
   -gencode=arch=compute_11,code=sm_11  -gencode=arch=compute_13,code=sm_13 \
   -gencode=arch=compute_20,code=sm_20  -gencode=arch=compute_20,code=compute_20 \
   -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)

Running CUDA applications

Both CUDA and OpenCL are instrumented dynamically through library preloading, use the tau_exec script to run the CUDA application:

   %> tau_exec -T serial,cupti -cupti ./Stencil2D

The -T serial specifies with TAU configuration to use, you can change this for MPI applications and run:

   %> mpirun -np 4 tau_exec -T mpi,cupti -cupti ./SGEMM

This could be done with executables build with or without TAU.

Traces

Traces can be recorded by first setting:

   %> export TAU_TRACE=1
   %> tau_exec -T serial,cupti -cupti ./Stencil2D
   %> tau_multimerge
   %> tau2slog2 tau.trc tau.edf -o stencil2d.slog2
   %> jumpshot stencil2d.slog2

Running OpenCL applications

Use tau_exec as well:

   %> tau_exec -T serial -opencl ./SGEMM 


Performance Data

Some example performance data from S3D:

File:S3D-cuda.ppk and File:S3D-cuda.slog2