Difference between revisions of "GTC SC11"
(→Experiment setup) |
(→Background) |
||
Line 5: | Line 5: | ||
In the fall of 2011, several performance studies were conducted on a port of the GTC application of GPUs ([http://dl.acm.org/citation.cfm?id=2063415 full paper]). | In the fall of 2011, several performance studies were conducted on a port of the GTC application of GPUs ([http://dl.acm.org/citation.cfm?id=2063415 full paper]). | ||
+ | 1/22/2012 - Updated with more details by Chee Wai Lee. | ||
=== Experiment setup === | === Experiment setup === |
Revision as of 01:24, 23 January 2012
Background
In the fall of 2011, several performance studies were conducted on a port of the GTC application of GPUs (full paper).
1/22/2012 - Updated with more details by Chee Wai Lee.
Experiment setup
Keeneland was chossen at the site for running these experiments, and is accessible to the developers as well. This example makes use of a user-built version of TAU. Instructions will be added for when we confirm the system-installed version of TAU works.
The GNU compilers were chosen so as to minimize in conflict with CUDArt with does not support other compilers. Because Keeneland's default environment is "PE-intel", the following module switch is probably needed - "module switch PE-intel PE-gnu".
Building TAU:
PDT: As of this update (1/22/2012), pdtoolkit (PDT) is still not available as a system-installed piece of software on Keeneland. As such, please acquire a copy of PDT and build it as follows:
cd <path-to-PDT>
./configure -gnu
make && make install
TAU:
cd <path-to-TAU>
./configure -pdt=<path-to-PDT> -pdt_c++=g++ -cuda=/sw/keeneland/cuda/4.0/linux_binary/ -bfd=download -cc=gcc -c++=g++ -mpi -cupti=/sw/keeneland/cuda/4.0/linux_binary/CUDAToolsSDK/CUPTI/ -openmp -opari
make install
Building GTC:
Here is the Makefile used:
# Define the following to 1 to enable build BENCH_GTC_MPI = 1 BENCH_CHARGEI_PTHREADS = 0 BENCH_PUSHI_PTHREADS = 0 BENCH_SERIAL = 0 SDK_HOME = /nics/c/home/biersdor/NVIDIA_GPU_Computing_SDK/ CUDA_HOME = /sw/keeneland/cuda/4.0/linux_binary NVCC_HOME = $(CUDA_HOME) TAU_MAKEFILE=/nics/c/home/biersdor/tau2/x86_64/lib/Makefile.tau-cupti-mpi-pdt-openmp-opari TAU_OPTIONS='-optPdtCOpts=-DPDT_PARSE -optVerbose -optShared -optTauSelectFile=select.tau' TAU_FLAGS=-tau_makefile=$(TAU_MAKEFILE) -tau_options=$(TAU_OPTIONS) CC = tau_cc.sh $(TAU_FLAGS) MPICC = tau_cc.sh $(TAU_FLAGS) NVCC = nvcc NVCC_FLAGS = -gencode=arch=compute_20,code=\"sm_20,compute_20\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" -m64 --compiler-options '-finstrument-functions -fno-strict-aliasing' -I$(NVCC_HOME)/include -I. -DUNIX -O3 -DGPU_ACCEL=1 -I./ -I$(SDK_HOME)/C/common/inc -I$(SDK_HOME)/shared/inc NVCC_LINK_FLAGS = -fPIC -m64 -L$(NVCC_HOME)/lib64 -L$(SDK_HOME)/shared/lib -L$(SDK_HOME)/C/lib -L$(SDK_HOME)/C/common/lib/linux -lcudart -lstdc++ CFLAGS = -DUSE_MPI=1 -DGPU_ACCEL=1 CFLAGSOMP = -fopenmp COPTFLAGS = -std=c99 #CFLAGSOMP = -mp=bind #COPTFLAGS = -fast CDEPFLAGS = -MD CLDFLAGS = -limf $(NVCC_LINK_FLAGS) MPIDIR = CFLAGS += -I$(CUDA_HOME)/include/ EXEEXT = _keeneland_opt_gnu_tau_pdt AR = ar ARCRFLAGS = cr RANLIB = ranlib
PDT was chosen to allow for event filtering here is the select file used:
BEGIN_EXCLUDE_LIST double RngStream_RandU01(RngStream) double U01(RngStream) END_EXCLUDE_LIST
Experiment simulation parameters
Along with the source code 3 sets of simulation parameters were given: A, B, C (largest). Also a choice of m-cell size: 20 or 96 (96 requires significantly more memory). A, B with m-cell size 20 were used for these performance results.
Performance Results
Here are some performance results that show the overall execution model:
This show a trace of a single execution on one MPI process (Multiple nodes/gpus can be utilized as well, performance behavior is similar across each process).
This shows a representative a profile for the GPU.