HPC interfacing example: ClimateMachine

Overview

This examples uses EnsembleKalmanProcesses.jl to calibrate a climate model, showcasing a workflow which is compatible with HPC resources managed with the SLURM workload manager. The workflow is based on read-write input/output files, and as such it is capable of interfacing with dynamical models in different code languages, or with complicated processing stages. The dynamical model for this example is ClimateMachine.jl, an Earth system model currently under development at CliMA.

The calibration example makes use of a simple single atmospheric column model configuration with two learnable parameters that control turbulent mixing processes in the lower troposphere. It is also a perfect model experiment, in the sense that the ground truth is generated using the same model and a prescribed combination of parameters. The parameters used to generate the ground truth are (C_smag, C_drag) = (0.21, 0.0011). The evolution of the atmosphere for this setup is strongly influenced by C_smag, and very weakly by C_drag. Thus, we expect the EKP to recover C_smag from observations.

Prerequisites

This example requires ClimateMachine.jl to be installed in the same parent directory as EnsembleKalmanProcesses.jl. You may install ClimateMachine.jl directly from GitHub,

$ git clone https://github.com/CliMA/ClimateMachine.jl.git

Change into the ClimateMachine.jl directory with

$ cd ClimateMachine.jl

and install all the required packages with:

$ julia --project -e 'using Pkg; pkg"instantiate";'

Pre-compile the packages to allow the ClimateMachine.jl to start faster:

$ julia --project -e 'using Pkg; pkg"precompile"'

You can find more information about ClimateMachine.jl here. ClimateMachine.jl is a rapidly evolving software and this example may stop working in the future, please open an issue if you find that to be the case!

Structure

The example makes use of julia and bash scripts for interactions with the workload manager and running multiple forward model evaluations in parallel. The user-triggered script ekp_calibration.sbatch initializes and controls the flow of the calibration process, which in this case is a SLURM queue with job dependencies. The calibration bash scripts, in order of execution and with their associated julia scripts, are

  1. ekp_init_calibration: Calls init_calibration.jl, which samples the initial parameter ensemble from a specified prior. The initial ensemble is stored in a set of parameter files.
  2. ekp_single_cm_run: Script called in parallel by the workload manager. Each copy of the script submits a single forward model run (i.e., a ClimateMachine.jl run) given a specific pair of parameters (C_smag, C_drag) read from a corresponding file. The output of each forward model run is stored in a separate NetCDF file.
  3. ekp_cont_calibration: Calls sstep_calibration.jl, which reads from a NetCDF file the output generated by ClimateMachine.jl in step 2 and performs an iteration of the Ensemble Kalman Inversion algorithm, updating the parameter ensemble. The new parameters are stored in new parameter files.

This flow follows steps 1->2->3->2->3->... for a user-specified number of iterations.

Running the Example

From the parent directory of ClimateMachine.jl and EnsembleKalmanProcesses.jl, change into the example directory with

$ cd EnsembleKalmanProcesses.jl/examples/ClimateMachine

and install all the required packages for the example with:

$ julia --project -e 'using Pkg; pkg"instantiate";'

To run the example using a SLURM workload manager, simply do:

$ sbatch ekp_calibration.sbatch

The dynamical model outputs (i.e, $\Psi(\phi)$) for all runs of ClimateMachine.jl will be stored in NetCDF format in directories identifiable by their version number. Refer to the files version_XX.txt to identify each run with each ensemble member within the XX iteration of the Ensemble Kalman Process.

In this example, the parameters are defined through a Gaussian prior in init_calibration.jl. Hence, the transform from constrained to unconstrained space is the identity map, and we have $\phi=\theta$. Overall the forward map $\mathcal{G}(\theta)$ is given by applying an observation map $\mathcal{H}$ to the dynamical model output $\Psi$. In this case, $\mathcal{H}$ returns the time average of the horizontal velocity over a specified 30 min interval after initialization. Therefore, the observation vector $y$ contains a time-averaged vertical profile of the horizontal velocity.

Calibration Solution

To aggregate the parameter ensembles $\theta^{(1)}, \theta^{(2)}, \dots, \theta^{(J)}$ generated during the calibration process, you may use the agg_clima_ekp(...) function located in helper_funcs.jl,

$ julia --project
include(joinpath(@__DIR__, "helper_funcs.jl"))

agg_clima_ekp(2) # This generates the output containing the ensembles for each iteration, input is the number of parameters

This will create the JLD file ekp_clima.jld. We may read the file as follows

using JLD

θ = load("ekp_clima.jld")["ekp_u"]
println(typeof(θ)) # Array{Array{Float64,2},1}, outer dimension is N_iter, inner Array{Float64,2} of size = (J, p)

The optimal parameter vector determined by the ensemble Kalman inversion is the ensemble mean of the particles after the last iteration. Following the previous script,

using Statistics

θ_opt = mean(θ[end], dims=1)