Running the ClimateMachine

The ClimateMachine is composed of three models for the Earth system, a dynamical core, and a number of other components. These are put together to set up a simulation by a driver, for example the Held-Suarez atmospheric GCM, or the Rising Bubble atmospheric LES. The driver specifies:

  • the dimensions and resolution of the simulation domain,
  • the duration of the simulation,
  • boundary conditions,
  • source terms,
  • a reference state,
  • the turbulence model,
  • the moisture model,
  • diagnostics of interest,
  • initial conditions,
  • etc.

Additionally, the driver chooses the time integrator to be used to run the simulation and may specify the Courant number used to compute the timestep.

Thus, running the ClimateMachine requires a driver. For example, the Held-Suarez atmospheric GCM is run with:

$ julia --project experiments/AtmosGCM/heldsuarez.jl

Simpler examples of driver files can be found in the tutorials. Driver files in experiments show more complex examples.

Input and output

The ClimateMachine uses ArtifactWrappers.jl to assist a driver in sourcing input data for a simulation, but any mechanism may be used.

Output takes the form of various groups of diagnostic variables that are written to NetCDF files at user-specified intervals by the ClimateMachine when configured to do so by a driver (see the how to guide).

The ClimateMachine can also output prognostic and auxiliary state variables to VTK files at specified intervals.

Whether or not output is generated, and if so, at what interval, is a ClimateMachine setting.

More information on output data formats and diagnostics can be found here.

ClimateMachine settings

Some aspects of the ClimateMachine's behavior can be controlled via its settings such as use of the GPU, diagnostics output and frequency, checkpointing/restarting, etc. There are 3 ways in which these settings can be changed:

  1. Command line arguments have the highest precedence, but it is possible for a driver to disable parsing of command line arguments. In such a case, only the next two ways can be used to change settings.

  2. Programmatic settings have the next highest precedence.

  3. Environment variables have the lowest precedence.

Command line arguments

If a driver configures the ClimateMachine to parse command line arguments (by passing parse_clargs = true to ClimateMachine.init()), you can query the list of arguments understood, for example:

$ julia --project experiments/AtmosGCM/heldsuarez.jl --help
usage: experiments/AtmosGCM/heldsuarez.jl [--disable-gpu]
                        [--show-updates <interval>]
                        [--diagnostics <interval>] [--no-overwrite]
                        [--vtk <interval>]
                        [--vtk-number-sample-points <number>]
                        [--monitor-timestep-duration <interval>]
                        [--monitor-courant-numbers <interval>]
                        [--adapt-timestep <interval>]
                        [--checkpoint <interval>]
                        [--checkpoint-keep-all] [--checkpoint-at-end]
                        [--checkpoint-dir <path>]
                        [--restart-from-num <number>] [--fix-rng-seed]
                        [--disable-custom-logger]
                        [--log-level <level>] [--output-dir <path>]
                        [--debug-init] [--integration-testing]
                        [--sim-time <number>]
                        [--fixed-number-of-steps <number>]
                        [--degree <horizontal>,<vertical>]
                        [--nelems <nelem_1>[,<nelem_2>[,<nelem_3>]]]
                        [--domain-height <number>]
                        [--resolution <Δx>,<Δy>,<Δz>]
                        [--domain-min <xmin>,<ymin>,<zmin>]
                        [--domain-max <xmax>,<ymax>,<zmax>]
                        [--number-of-tracers <number>] [-h]

Climate Machine: an Earth System Model that automatically learns from data

optional arguments:
  --number-of-tracers <number>
                        Number of dummy tracers (type: Int64, default:
                        0)
  -h, --help            show this help message and exit

ClimateMachine:
  --disable-gpu         do not use the GPU
  --show-updates <interval>
                        interval at which to show simulation updates
                        (default: "60secs")
  --diagnostics <interval>
                        interval at which to collect diagnostics
                        (default: "never")
  --no-overwrite        throw an error if an output file would be
                        overwritten
  --vtk <interval>      interval at which to output VTK (default:
                        "never")
  --vtk-number-sample-points <number>
                        number of sampling points in each element for
                        VTK output (type: Int64, default: 0)
  --monitor-timestep-duration <interval>
                        interval in time-steps at which to output
                        wall-clock time per time-step (default:
                        "never")
  --monitor-courant-numbers <interval>
                        interval at which to output acoustic,
                        advective, and diffusive Courant numbers
                        (default: "never")
  --adapt-timestep <interval>
                        interval at which to update the timestep
                        (default: "never")
  --checkpoint <interval>
                        interval at which to create a checkpoint
                        (default: "never")
  --checkpoint-keep-all
                        keep all checkpoints (instead of just the most
                        recent)
  --checkpoint-at-end   create a checkpoint at the end of the
                        simulation
  --checkpoint-on-crash 
                        create a checkpoint on a kernel crash (hurts
                        performance!)
  --checkpoint-dir <path>
                        the directory in which to store checkpoints
                        (default: "checkpoint")
  --restart-from-num <number>
                        checkpoint number from which to restart (in
                        <checkpoint-dir>) (type: Int64, default: -1)
  --fix-rng-seed        set RNG seed to a fixed value for
                        reproducibility
  --disable-custom-logger
                        do not use a custom logger
  --log-level <level>   set the log level to one of
                        debug/info/warn/error (default: "INFO")
  --output-dir <path>   directory for output data (default: "output")
  --debug-init          fill state arrays with NaNs and dump them
                        post-initialization
  --integration-testing
                        enable integration testing
  --sim-time <number>   run for the specified time (in simulation
                        seconds) (type: Float64, default: NaN)
  --fixed-number-of-steps <number>
                        if `≥0` perform specified number of steps
                        (type: Int64, default: -1)
  --degree <horizontal>,<vertical>
                        tuple of horizontal and vertical polynomial
                        degrees for spatial discretization order (no
                        space before/after comma) (type:
                        Tuple{Int64,Int64}, default: (-1, -1))
  --nelems <nelem_1>[,<nelem_2>[,<nelem_3>]]
                        number of elements in each direction: 3 for
                        Ocean GCM, 2 for Atmos GCM or 1 for Atmos
                        single-stack (no space before/after comma)
                        (type: Tuple{Int64,Int64,Int64}, default: (-1,
                        -1, -1))
  --domain-height <number>
                        domain height (in meters) for GCM or
                        single-stack configurations (type: Float64,
                        default: -1.0)
  --resolution <Δx>,<Δy>,<Δz>
                        tuple of three element resolutions (in meters)
                        for LES and MultiColumnLandModel
                        configurations (type:
                        Tuple{Float64,Float64,Float64}, default:
                        (-1.0, -1.0, -1.0))
  --domain-min <xmin>,<ymin>,<zmin>
                        tuple of three minima for the domain size (in
                        meters) for LES and MultiColumnLandModel
                        configurations (type:
                        Tuple{Float64,Float64,Float64}, default:
                        (-1.0, -1.0, -1.0))
  --domain-max <xmax>,<ymax>,<zmax>
                        tuple of three maxima for the domain size (in
                        meters) for LES and MultiColumnLandModel
                        configurations (type:
                        Tuple{Float64,Float64,Float64}, default:
                        (-1.0, -1.0, -1.0))

Any <interval> unless otherwise stated may be specified as:
    - 2hours or 10mins or 30secs => wall-clock time
    - 9.5smonths or 3.3sdays or 1.5shours => simulation time
    - 1000steps => simulation steps
    - never => disable
    - default => use experiment specified interval (only for diagnostics at present)

There may also be driver-specific command line arguments.

Programmatic control

Every ClimateMachine setting can also be controlled via keyword arguments to the ClimateMachine initialization function, ClimateMachine.init(). For instance, a driver can specify that VTK output should occur every 5 simulation minutes with:

ClimateMachine.init(vtk = "5smins")

This can be overridden by by passing --vtk=never on the command line, if the ClimateMachine is parsing command line arguments.

Note

The ClimateMachine will only process command line arguments if a driver requests that it do so with:

ClimateMachine.init(parse_clargs = true)

Environment variables

Every ClimateMachine command line argument has an equivalent environment variable that takes the form CLIMATEMACHINE_SETTINGS_<SETTING_NAME>, however command line arguments and programmatic control have higher precedence.

Running with MPI

Use MPI to start a distributed run of the ClimateMachine. For example:

mpiexec -np 4 julia --project experiments/AtmosGCM/heldsuarez.jl

will run the Held-Suarez experiment with four MPI ranks. If you are running on a cluster, you would use this command within a SLURM batch script (or the equivalent) that allocates four tasks. On a stand-alone machine, MPI will likely require that you have at least four cores.

Note that unless GPU use is disabled (by changing the setting in one of the ways described above), each ClimateMachine process will use GPU acceleration. If there are insufficient GPUs (four in the example above), the ClimateMachine processes will share the GPU resources available.

Scripts for end-to-end runs, logging and visualization

The ClimateMachine wiki contains detailed examples of Slurm scripts that run the ClimateMachine, record specified performance metrics and produce basic visualization output.