Skip to content

petebachant/clima-gpu-profiling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

144 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CliMA GPU profiling

Profiling GPU performance for CliMA.

To run this project, first install Calkit on the clima machine:

curl -LsSf install.calkit.org | sh

Next, configure a token for interacting with calkit.io (where we store version-controlled Nsight reports).

If you don't already have an SSH key added to GitHub, either follow their documentation or run:

calkit config github-ssh

Then clone the project:

calkit clone --ssh petebachant/clima-gpu-profiling

Lastly, call:

calkit run

This will run all pipeline stages in the order they're defined in calkit.yaml. If you'd like to run a single stage (in a reproducible way), you can use its name as the first positional argument to calkit run. For example:

calkit run amip-clima-nsys

However, by default, only stages whose Nsight reports are now invalid (since their inputs have changed since the last run) will run.

Running a Jupyter server on clima with srun

srun --gpus=1 --mpi=none --pty bash
calkit jupyter lab --ip=0.0.0.0 --no-browser

Then, copy the server URL, which starts with http://127.0.0.1, and in VS Code, use that when selecting a kernel for the notebook.

Submodule branches

  • ClimaCore.jl -->
  • ClimaCore.jl-mod -->
  • ClimaCoupler.jl --> pb/investigate-atmos-4231
  • ClimaCoupler.jl-mod --> pb/investigate-atmos-4231-mod
  • ClimaAtmos.jl -->
  • ClimaAtmos.jl-mod -->

Experiment results

Commit (super-repo) Change summary Result
130baab Occupancy for run_field_matrix_solver increased and reduced registers per thread but slowed down overall.
e5845b7 Similar as above, but not quite as slow.
ff26f4b1 Use PCR for tri-diagonal matrix solve. Seems to be 3% faster, but higher error. May not have isolated changed properly though.
e6099c2 Try capping all threads to 256 1% slower on flagship.
23c9104 Attempt to coalesce memory access in solvers. 5% slowdown.
7614ca6 Thread block restructuring and LocalGeometry caching. No significant change.
f9eb67a Tr/mem access patterns 9% speedup.

About

Profiling GPU performance for CliMA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors