Running the ClimateMachine
The ClimateMachine
is composed of three models for the Earth system, a dynamical core, and a number of other components. These are put together to set up a simulation by a driver, for example the Held-Suarez atmospheric GCM, or the Rising Bubble atmospheric LES. The driver specifies:
- the dimensions and resolution of the simulation domain,
- the duration of the simulation,
- boundary conditions,
- source terms,
- a reference state,
- the turbulence model,
- the moisture model,
- diagnostics of interest,
- initial conditions,
- etc.
Additionally, the driver chooses the time integrator to be used to run the simulation and may specify the Courant number used to compute the timestep.
Thus, running the ClimateMachine
requires a driver. For example, the Held-Suarez atmospheric GCM is run with:
$ julia --project experiments/AtmosGCM/heldsuarez.jl
Simpler examples of driver files can be found in the tutorials. Driver files in experiments show more complex examples.
Input and output
The ClimateMachine
uses ArtifactWrappers.jl to assist a driver in sourcing input data for a simulation, but any mechanism may be used.
Output takes the form of various groups of diagnostic variables that are written to NetCDF files at user-specified intervals by the ClimateMachine
when configured to do so by a driver (see the how to guide).
The ClimateMachine
can also output prognostic and auxiliary state variables to VTK files at specified intervals.
Whether or not output is generated, and if so, at what interval, is a ClimateMachine
setting.
More information on output data formats and diagnostics can be found here.
ClimateMachine
settings
Some aspects of the ClimateMachine
's behavior can be controlled via its settings such as use of the GPU, diagnostics output and frequency, checkpointing/restarting, etc. There are 3 ways in which these settings can be changed:
Command line arguments have the highest precedence, but it is possible for a driver to disable parsing of command line arguments. In such a case, only the next two ways can be used to change settings.
Programmatic settings have the next highest precedence.
Environment variables have the lowest precedence.
Command line arguments
If a driver configures the ClimateMachine
to parse command line arguments (by passing parse_clargs = true
to ClimateMachine.init()
), you can query the list of arguments understood, for example:
$ julia --project experiments/AtmosGCM/heldsuarez.jl --help
usage: experiments/AtmosGCM/heldsuarez.jl [--disable-gpu]
[--show-updates <interval>]
[--diagnostics <interval>] [--no-overwrite]
[--vtk <interval>]
[--vtk-number-sample-points <number>]
[--monitor-timestep-duration <interval>]
[--monitor-courant-numbers <interval>]
[--adapt-timestep <interval>]
[--checkpoint <interval>]
[--checkpoint-keep-all] [--checkpoint-at-end]
[--checkpoint-dir <path>]
[--restart-from-num <number>] [--fix-rng-seed]
[--disable-custom-logger]
[--log-level <level>] [--output-dir <path>]
[--debug-init] [--integration-testing]
[--sim-time <number>]
[--fixed-number-of-steps <number>]
[--degree <horizontal>,<vertical>]
[--nelems <nelem_1>[,<nelem_2>[,<nelem_3>]]]
[--domain-height <number>]
[--resolution <Δx>,<Δy>,<Δz>]
[--domain-min <xmin>,<ymin>,<zmin>]
[--domain-max <xmax>,<ymax>,<zmax>]
[--number-of-tracers <number>] [-h]
Climate Machine: an Earth System Model that automatically learns from data
optional arguments:
--number-of-tracers <number>
Number of dummy tracers (type: Int64, default:
0)
-h, --help show this help message and exit
ClimateMachine:
--disable-gpu do not use the GPU
--show-updates <interval>
interval at which to show simulation updates
(default: "60secs")
--diagnostics <interval>
interval at which to collect diagnostics
(default: "never")
--no-overwrite throw an error if an output file would be
overwritten
--vtk <interval> interval at which to output VTK (default:
"never")
--vtk-number-sample-points <number>
number of sampling points in each element for
VTK output (type: Int64, default: 0)
--monitor-timestep-duration <interval>
interval in time-steps at which to output
wall-clock time per time-step (default:
"never")
--monitor-courant-numbers <interval>
interval at which to output acoustic,
advective, and diffusive Courant numbers
(default: "never")
--adapt-timestep <interval>
interval at which to update the timestep
(default: "never")
--checkpoint <interval>
interval at which to create a checkpoint
(default: "never")
--checkpoint-keep-all
keep all checkpoints (instead of just the most
recent)
--checkpoint-at-end create a checkpoint at the end of the
simulation
--checkpoint-on-crash
create a checkpoint on a kernel crash (hurts
performance!)
--checkpoint-dir <path>
the directory in which to store checkpoints
(default: "checkpoint")
--restart-from-num <number>
checkpoint number from which to restart (in
<checkpoint-dir>) (type: Int64, default: -1)
--fix-rng-seed set RNG seed to a fixed value for
reproducibility
--disable-custom-logger
do not use a custom logger
--log-level <level> set the log level to one of
debug/info/warn/error (default: "INFO")
--output-dir <path> directory for output data (default: "output")
--debug-init fill state arrays with NaNs and dump them
post-initialization
--integration-testing
enable integration testing
--sim-time <number> run for the specified time (in simulation
seconds) (type: Float64, default: NaN)
--fixed-number-of-steps <number>
if `≥0` perform specified number of steps
(type: Int64, default: -1)
--degree <horizontal>,<vertical>
tuple of horizontal and vertical polynomial
degrees for spatial discretization order (no
space before/after comma) (type:
Tuple{Int64,Int64}, default: (-1, -1))
--nelems <nelem_1>[,<nelem_2>[,<nelem_3>]]
number of elements in each direction: 3 for
Ocean GCM, 2 for Atmos GCM or 1 for Atmos
single-stack (no space before/after comma)
(type: Tuple{Int64,Int64,Int64}, default: (-1,
-1, -1))
--domain-height <number>
domain height (in meters) for GCM or
single-stack configurations (type: Float64,
default: -1.0)
--resolution <Δx>,<Δy>,<Δz>
tuple of three element resolutions (in meters)
for LES and MultiColumnLandModel
configurations (type:
Tuple{Float64,Float64,Float64}, default:
(-1.0, -1.0, -1.0))
--domain-min <xmin>,<ymin>,<zmin>
tuple of three minima for the domain size (in
meters) for LES and MultiColumnLandModel
configurations (type:
Tuple{Float64,Float64,Float64}, default:
(-1.0, -1.0, -1.0))
--domain-max <xmax>,<ymax>,<zmax>
tuple of three maxima for the domain size (in
meters) for LES and MultiColumnLandModel
configurations (type:
Tuple{Float64,Float64,Float64}, default:
(-1.0, -1.0, -1.0))
Any <interval> unless otherwise stated may be specified as:
- 2hours or 10mins or 30secs => wall-clock time
- 9.5smonths or 3.3sdays or 1.5shours => simulation time
- 1000steps => simulation steps
- never => disable
- default => use experiment specified interval (only for diagnostics at present)
There may also be driver-specific command line arguments.
Programmatic control
Every ClimateMachine
setting can also be controlled via keyword arguments to the ClimateMachine
initialization function, ClimateMachine.init()
. For instance, a driver can specify that VTK output should occur every 5 simulation minutes with:
ClimateMachine.init(vtk = "5smins")
This can be overridden by by passing --vtk=never
on the command line, if the ClimateMachine
is parsing command line arguments.
The ClimateMachine
will only process command line arguments if a driver requests that it do so with:
ClimateMachine.init(parse_clargs = true)
Environment variables
Every ClimateMachine
command line argument has an equivalent environment variable that takes the form CLIMATEMACHINE_SETTINGS_<SETTING_NAME>
, however command line arguments and programmatic control have higher precedence.
Running with MPI
Use MPI to start a distributed run of the ClimateMachine
. For example:
mpiexec -np 4 julia --project experiments/AtmosGCM/heldsuarez.jl
will run the Held-Suarez experiment with four MPI ranks. If you are running on a cluster, you would use this command within a SLURM batch script (or the equivalent) that allocates four tasks. On a stand-alone machine, MPI will likely require that you have at least four cores.
Note that unless GPU use is disabled (by changing the setting in one of the ways described above), each ClimateMachine
process will use GPU acceleration. If there are insufficient GPUs (four in the example above), the ClimateMachine
processes will share the GPU resources available.
Scripts for end-to-end runs, logging and visualization
The ClimateMachine
wiki contains detailed examples of Slurm scripts that run the ClimateMachine
, record specified performance metrics and produce basic visualization output.