Difference between revisions of "GTC"

From Tau Wiki
Jump to: navigation, search
 
Line 1: Line 1:
ok
+
I have had success building GTC on both jacquard.nersc.gov and ocracoke.renci.org.  There are a number of publications on the GTC application - here are a few:
  
 +
http://www.iop.org/EJ/article/-search=16435725.1/1742-6596/46/1/010/jpconf6_46_010.pdf
 +
http://www.iop.org/EJ/article/-search=16435725.2/1742-6596/16/1/002/jpconf5_16_002.pdf
 +
http://www.iop.org/EJ/article/-search=16435725.3/1742-6596/16/1/008/jpconf5_16_008.pdf
 +
http://www.iop.org/EJ/article/-search=16435725.4/1742-6596/16/1/001/jpconf5_16_001.pdf
 +
 +
There is a README that comes with GTC.  Here is the text of that README file:
 +
 +
<nowiki>12/16/2004    HOW TO BUILD AND RUN GTC    (S.Ethier, PPPL)
 +
            ----------------------------
 +
 +
1. You need to use GNU make to compile GTC. The Makefile contains some
 +
  gmake syntax, such as VARIABLE:=
 +
 +
2. The Makefile is fairly straightforward. It runs the command "uname -s"
 +
  to determine the OS of the current computer. You will need to change
 +
  the Makefile if you use a cross-compiler.
 +
 +
3. The executable is called "gtcmpi". GTC runs in single precision unless
 +
  specified during the "make" by using "gmake DOUBLE_PRECISION=y".
 +
  See the instructions at the beginning of the Makefile.
 +
 +
4. GTC reads an input file called "gtc.input". The distribution contains
 +
  several input files for different problem sizes:
 +
 +
    All input files use 10 particles par cell (micell=10)
 +
 +
  name      # of particles    # of grid pts    approx. memory size
 +
  ------    ----------------  ---------------  -------------
 +
  a125      20,709,760  (20M)  2,076,736  (2M)      5 GB
 +
  a250      96,620,160 (100M)  9,674,304  (10M)      22 GB
 +
  a500    385,479,040 (385M)  38,572,480 (39M)      80 GB
 +
  a750    866,577,280 (866M)  86,694,592 (87M)    180 GB
 +
  a1000  1,312,542,080 (1.3B)  131,300,288(131M)    272 GB
 +
 +
  For a specific "device size" (fixed number of grid points), one can
 +
  change the number of particles per grid cell by increasing or decreasing
 +
  the input variable "micell", the number of particles per cell.
 +
 +
5. To run one of the cases, copy the chosen input file into "gtc.input".
 +
 +
6. For the given input files, the maximum number of processors that one
 +
  can use for the grid-base 1D domain decomposition is 64. This limit
 +
  comes from the mzetamax parameter in the input file. To access more
 +
  processors, increase the "npartdom" parameter accordingly. "npartdom"
 +
  controls the particle decomposition inside a domain. If, for example,
 +
  npartdom=2, the particles in a domain will be split equally between
 +
  2 processors. Here are some quick rules:
 +
 +
  mzetamax=64, npartdom=1 --> possible no. of processors = 1,2,4,8,16,32,64
 +
  mzetamax=64, npartdom=2 --> use 128 processors
 +
  mzetamax=64, npartdom=4 --> use 256 processors
 +
  etc...
 +
 +
  When npartdom is large, it's a good idea to increase the number of
 +
  particles per cell by changing the parameter "micell" in gtc.input.
 +
  micell=100 is a decent number of particles, although the memory
 +
  footprint is larger.</nowiki>
 +
 +
For auto-instrumentation of GTC, it couldn't be easier.  On jacquard.nersc.gov, configure tau with the following:
 +
 +
-c++=pathCC -cc=pathcc -fortran=pathscale -useropt=-O3 \
 +
-pdt=/usr/common/homes/k/khuck/pdtoolkit \
 +
-mpiinc=/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/include \
 +
-mpilib=/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/lib \
 +
-mpilibrary=-lmpich#-L/usr/local/ibgd/driver/infinihost/lib64#-lvapi \
 +
-papi=/usr/common/usg/papi/3.1.0 -MULTIPLECOUNTERS \
 +
-useropt=-I/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/include/f90base
 +
 +
add a section to the Linux build area that looks like this (the extra include path finds the MPI module file):
 +
 +
<nowiki>
 +
  ifeq ($(TAUF90),y)
 +
    F90C:=tau_f90.sh -optTauSelectFile=select.tau
 +
    CMP:=tau_f90.sh -optTauSelectFile=select.tau
 +
    OPT:=-O -freeform  -I/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/include/f90base
 +
    OPT2:=-O -freeform  -I/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/include/f90base
 +
    LIB:=
 +
  endif</nowiki> 
 +
 +
To build the TAU instrumented version, use the following gmake command:
 +
 +
gmake TAUF90=y
 +
 +
Here is an example batch submission script to run GTC on 64 nodes of jacquard (first, copy gtc.input.64p to gtc.input):
 +
 +
<nowiki>#PBS -l nodes=32:ppn=2,walltime=00:30:00
 +
#PBS -N gtcmpi
 +
#PBS -o gtcmpi.64.out
 +
#PBS -e gtcmpi.64.err
 +
#PBS -q batch
 +
#PBS -A m88
 +
#PBS -V
 +
 +
setenv PATH $HOME/tau2/x86_64/bin:${PATH}
 +
setenv TAU_CALLPATH_DEPTH 500
 +
setenv COUNTER1 GET_TIME_OF_DAY
 +
setenv COUNTER2 PAPI_FP_INS
 +
setenv COUNTER3 PAPI_TOT_CYC
 +
setenv COUNTER4 PAPI_L1_DCM
 +
setenv COUNTER5 PAPI_L1_DCM
 +
 +
cd /u5/khuck/gtc_bench/test/64
 +
mpiexec -np 64 gtcmpi
 +
</nowiki>
 +
 +
To run the test with fewer iterations, change the "mstep" parameter in the gtc.input file to something smaller than 100 (10 or 12 is fine).
 +
 +
--[[User:Khuck|Khuck]] 21:45, 7 March 2007 (PST)
 
[[Category:Applications]]
 
[[Category:Applications]]

Revision as of 05:45, 8 March 2007

I have had success building GTC on both jacquard.nersc.gov and ocracoke.renci.org. There are a number of publications on the GTC application - here are a few:

http://www.iop.org/EJ/article/-search=16435725.1/1742-6596/46/1/010/jpconf6_46_010.pdf http://www.iop.org/EJ/article/-search=16435725.2/1742-6596/16/1/002/jpconf5_16_002.pdf http://www.iop.org/EJ/article/-search=16435725.3/1742-6596/16/1/008/jpconf5_16_008.pdf http://www.iop.org/EJ/article/-search=16435725.4/1742-6596/16/1/001/jpconf5_16_001.pdf

There is a README that comes with GTC. Here is the text of that README file:

12/16/2004    HOW TO BUILD AND RUN GTC    (S.Ethier, PPPL)
            ----------------------------

1. You need to use GNU make to compile GTC. The Makefile contains some
   gmake syntax, such as VARIABLE:=

2. The Makefile is fairly straightforward. It runs the command "uname -s"
   to determine the OS of the current computer. You will need to change
   the Makefile if you use a cross-compiler.

3. The executable is called "gtcmpi". GTC runs in single precision unless
   specified during the "make" by using "gmake DOUBLE_PRECISION=y".
   See the instructions at the beginning of the Makefile.

4. GTC reads an input file called "gtc.input". The distribution contains
   several input files for different problem sizes:

     All input files use 10 particles par cell (micell=10)

   name      # of particles     # of grid pts     approx. memory size
  ------    ----------------   ---------------   -------------
   a125      20,709,760  (20M)  2,076,736   (2M)       5 GB
   a250      96,620,160 (100M)  9,674,304  (10M)      22 GB
   a500     385,479,040 (385M)  38,572,480 (39M)      80 GB
   a750     866,577,280 (866M)  86,694,592 (87M)     180 GB
   a1000  1,312,542,080 (1.3B)  131,300,288(131M)    272 GB

   For a specific "device size" (fixed number of grid points), one can
   change the number of particles per grid cell by increasing or decreasing
   the input variable "micell", the number of particles per cell.

5. To run one of the cases, copy the chosen input file into "gtc.input".

6. For the given input files, the maximum number of processors that one
   can use for the grid-base 1D domain decomposition is 64. This limit
   comes from the mzetamax parameter in the input file. To access more
   processors, increase the "npartdom" parameter accordingly. "npartdom"
   controls the particle decomposition inside a domain. If, for example,
   npartdom=2, the particles in a domain will be split equally between
   2 processors. Here are some quick rules:

   mzetamax=64, npartdom=1 --> possible no. of processors = 1,2,4,8,16,32,64
   mzetamax=64, npartdom=2 --> use 128 processors
   mzetamax=64, npartdom=4 --> use 256 processors
   etc...

   When npartdom is large, it's a good idea to increase the number of
   particles per cell by changing the parameter "micell" in gtc.input.
   micell=100 is a decent number of particles, although the memory
   footprint is larger.

For auto-instrumentation of GTC, it couldn't be easier. On jacquard.nersc.gov, configure tau with the following:

-c++=pathCC -cc=pathcc -fortran=pathscale -useropt=-O3 \
-pdt=/usr/common/homes/k/khuck/pdtoolkit \
-mpiinc=/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/include \
-mpilib=/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/lib \
-mpilibrary=-lmpich#-L/usr/local/ibgd/driver/infinihost/lib64#-lvapi \
-papi=/usr/common/usg/papi/3.1.0 -MULTIPLECOUNTERS \
-useropt=-I/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/include/f90base

add a section to the Linux build area that looks like this (the extra include path finds the MPI module file):

  ifeq ($(TAUF90),y)
    F90C:=tau_f90.sh -optTauSelectFile=select.tau
    CMP:=tau_f90.sh -optTauSelectFile=select.tau
    OPT:=-O -freeform  -I/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/include/f90base
    OPT2:=-O -freeform  -I/usr/common/usg/mvapich/pathscale/mvapich-0.9.5-mlx1.0.3/include/f90base
    LIB:=
  endif  

To build the TAU instrumented version, use the following gmake command:

gmake TAUF90=y

Here is an example batch submission script to run GTC on 64 nodes of jacquard (first, copy gtc.input.64p to gtc.input):

#PBS -l nodes=32:ppn=2,walltime=00:30:00
#PBS -N gtcmpi
#PBS -o gtcmpi.64.out
#PBS -e gtcmpi.64.err
#PBS -q batch
#PBS -A m88
#PBS -V

setenv PATH $HOME/tau2/x86_64/bin:${PATH}
setenv TAU_CALLPATH_DEPTH 500
setenv COUNTER1 GET_TIME_OF_DAY
setenv COUNTER2 PAPI_FP_INS
setenv COUNTER3 PAPI_TOT_CYC
setenv COUNTER4 PAPI_L1_DCM
setenv COUNTER5 PAPI_L1_DCM

cd /u5/khuck/gtc_bench/test/64
mpiexec -np 64 gtcmpi

To run the test with fewer iterations, change the "mstep" parameter in the gtc.input file to something smaller than 100 (10 or 12 is fine).

--Khuck 21:45, 7 March 2007 (PST)