Difference between revisions of "Cruft"
(3 intermediate revisions by the same user not shown) | |||
Line 18: | Line 18: | ||
|March 2012 | |March 2012 | ||
|} | |} | ||
+ | |||
+ | ==== These instructions can also be used for CoMD ==== | ||
== Building Cruft == | == Building Cruft == | ||
+ | |||
+ | For OpenCL: | ||
+ | |||
+ | export OPENCL_INCLUDE_DIR=<path to OpenCL include dir> | ||
Modify the CmakeLists.txt and add these lines: | Modify the CmakeLists.txt and add these lines: | ||
Line 41: | Line 47: | ||
END_INSTRUMENT_SECTION | END_INSTRUMENT_SECTION | ||
− | For the | + | For the OpenCL binary edit src-ocl/eam_kernels.c to move this section about the typedef CL_REAL_T real_t; |
#if defined(cl_khr_fp64) // Khronos extension available? | #if defined(cl_khr_fp64) // Khronos extension available? | ||
Line 51: | Line 57: | ||
Then set: | Then set: | ||
− | export TAU_OPTIONS="-optVerbose -optTauSelectFile=`pwd`/select.tau" | + | export TAU_OPTIONS="-optShared -optVerbose -optTauSelectFile=`pwd`/select.tau" |
export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-icpc-pdt | export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-icpc-pdt | ||
make | make | ||
Line 73: | Line 79: | ||
EAM method: | EAM method: | ||
+ | |||
+ | First the serial version of Cruft shows two loops in eam.c consumes most of the time. | ||
[[Image:cruft-EAM-profile.png|750px]] | [[Image:cruft-EAM-profile.png|750px]] | ||
− | + | In comparison the OpenCL accelerated version two kernels dominate the runtime. | |
[[Image:cruftOCL-eam-profile.png|450px]] | [[Image:cruftOCL-eam-profile.png|450px]] | ||
+ | One thing you can check with OpenCL application is the time spent in command queue here the table for each kernel: | ||
+ | |||
+ | [[Image:cruftOCL-eam-queue.png|750px]] | ||
+ | |||
+ | Profile Data: | ||
+ | |||
+ | [[Image:cruft-EAM.ppk]], | ||
[[Image:cruftOCL-EAM.ppk]] | [[Image:cruftOCL-EAM.ppk]] | ||
LJ method: | LJ method: | ||
+ | |||
+ | First the serial version of Cruft shows a single loop accounts for runtime. | ||
[[Image:cruft-LJ-profile.png|750px]] | [[Image:cruft-LJ-profile.png|750px]] | ||
− | + | In comparison the OpenCL accelerated version the LJ_Force kernel dominate the runtime. | |
[[Image:cruftOCL-lj-profile.png|450px]] | [[Image:cruftOCL-lj-profile.png|450px]] | ||
+ | Ones again here is the time spent in the queue for this kernels. | ||
+ | |||
+ | [[Image:cruftOCL-lj-queue.png|750px]] | ||
+ | |||
+ | Profile Data: | ||
+ | |||
+ | [[Image:cruft-LJ.ppk]], | ||
[[Image:cruftOCL-LJ.ppk]] | [[Image:cruftOCL-LJ.ppk]] |
Latest revision as of 23:08, 27 December 2012
Contents
Background
Link | Code | Version | Machine | Date |
---|---|---|---|---|
LLNL website | git repo | Kyle Spafford fork | Keeneland | March 2012 |
These instructions can also be used for CoMD
Building Cruft
For OpenCL:
export OPENCL_INCLUDE_DIR=<path to OpenCL include dir>
Modify the CmakeLists.txt and add these lines:
set (CMAKE_CXX_COMPILER tau_cxx.sh) set (CMAKE_C_COMPILER tau_cc.sh)
Then issue
cmake .
You can safety proceed when you encounter reversions.
Selective instrumentation of Loops:
BEGIN_INSTRUMENT_SECTION loops file="eam.c" routine="eamForce#" loops file="ljForce.c" routine="LJ#" END_INSTRUMENT_SECTION
For the OpenCL binary edit src-ocl/eam_kernels.c to move this section about the typedef CL_REAL_T real_t;
#if defined(cl_khr_fp64) // Khronos extension available? #pragma OPENCL EXTENSION cl_khr_fp64 : enable #elif defined(cl_amd_fp64) // AMD extension available? #pragma OPENCL EXTENSION cl_amd_fp64 : enable #endif
Then set:
export TAU_OPTIONS="-optShared -optVerbose -optTauSelectFile=`pwd`/select.tau" export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-icpc-pdt make
Running Cruft
./cruft -p ag -e -f data/8k.inp.gz
or
./cruft -f data/8k.inp.gz
And for OpenCL accelerated version:
tau_exec -T serial -opencl ./cruftOCL -p ag -e -f data/8k.inp.gz
tau_exec -T serial -opencl ./cruftOCL -f data/8k.inp.gz
Performance Data
EAM method:
First the serial version of Cruft shows two loops in eam.c consumes most of the time.
In comparison the OpenCL accelerated version two kernels dominate the runtime.
One thing you can check with OpenCL application is the time spent in command queue here the table for each kernel:
Profile Data:
File:Cruft-EAM.ppk, File:CruftOCL-EAM.ppk
LJ method:
First the serial version of Cruft shows a single loop accounts for runtime.
In comparison the OpenCL accelerated version the LJ_Force kernel dominate the runtime.
Ones again here is the time spent in the queue for this kernels.
Profile Data: