MPAS-Ocean
Contents
Overview
This is the TAU profiling MPAS-Ocean page.
The MPAS-Ocean code has been modified to use TAU as the timers, rather than the internal timers. This provides for both MPI performance measurement as well as PAPI counters.
The MPAS-Ocean developers have collected profiles on Hopper, with 192 to 16800 processes, using MPI only (no OpenMP yet). In addition, full callpath and communication matrix profiles with 128 processes on Hopper have been collected.
Those profiles are available here: ParaProf, PerfExplorer. The client applications can only connect to the performance database from specific domains, and with authenticated access. Please contact the TAU team to request access to the raw data.
Below is a brief analysis of the application performance.
Performance Analysis
Scaling behavior
As mentioned before, the application was executed with 192 through 16800 processes in a strong scaling study (the total problem size did not change).
insert scaling figure here
Broken down by timed regions, the scaling behavior is this:
insert scaling figure here
Clearly, MPI_Wait is overly dominant, and as we shall see in the per-trial analysis, varies considerably across processes.
Detailed analysis of 128 process callpath profile
Detailed analysis of 128 process flat profile with communication matrix
The functions in the profile view are, from left to right (also visible in the mean profile, below):
- MPI_Wait()
- se btr vel
- adv
- se timestep
- se implicit vert mix
- coriolis
- se halo ubtr
- ocn_fuperp