API

Model Interface

ClimaCalibrate.forward_modelFunction
forward_model(iteration, member)

Execute the forward model simulation with the given configuration.

This function must be overridden by a component's model interface and should set things like the parameter path and other member-specific settings.

source
ClimaCalibrate.observation_mapFunction
observation_map(iteration)

Runs the observation map for the specified iteration. This function must be implemented for each calibration experiment.

source

Worker Interface

ClimaCalibrate.SlurmManagerType
SlurmManager(ntasks=get(ENV, "SLURM_NTASKS", 1))

The ClusterManager for Slurm clusters, taking in the number of tasks to request with srun.

To execute the srun command, run addprocs(SlurmManager(ntasks))

Keyword arguments can be passed to srun: addprocs(SlurmManager(ntasks), gpus_per_task=1)

By default the workers will inherit the running Julia environment.

To run a calibration, call calibrate(WorkerBackend, ...)

To run functions on a worker, call remotecall(func, worker_id, args...)

source
ClimaCalibrate.PBSManagerType
PBSManager(ntasks)

The ClusterManager for PBS/Torque clusters, taking in the number of tasks to request with qsub.

To execute the qsub command, run addprocs(PBSManager(ntasks)). Unlike the SlurmManager, this will not nest scheduled jobs, but will acquire new resources.

Keyword arguments can be passed to qsub: addprocs(PBSManager(ntasks), nodes=2)

By default, the workers will inherit the running Julia environment.

To run a calibration, call calibrate(WorkerBackend, ...)

To run functions on a worker, call remotecall(func, worker_id, args...)

source

Backend Interface

ClimaCalibrate.calibrateFunction
calibrate(backend, ensemble_size, n_iterations, observations, noise, prior, output_dir; ekp_kwargs...)
calibrate(backend, ekp::EnsembleKalmanProcess, ensemble_size, n_iterations, prior, output_dir)
calibrate(backend, config::ExperimentConfig; ekp_kwargs...)

Run a full calibration on the given backend.

If the EKP struct is not given, it will be constructed upon initialization. The experiment configuration (ensemble size, prior, observations, etc) can be wrapped in an ExperimentConfig or passed in as arguments to the function.

Available Backends: WorkerBackend, CaltechHPCBackend, ClimaGPUBackend, DerechoBackend, JuliaBackend

Derecho, ClimaGPU, and CaltechHPC backends are designed to run on a specific high-performance computing cluster. WorkerBackend uses Distributed.jl to run the forward model on workers.

Keyword Arguments for HPC backends

  • `model_interface: Path to the model interface file.
  • hpc_kwargs: Dictionary of resource arguments for HPC clusters, passed to the job scheduler.
  • verbose::Bool: Enable verbose logging.
  • Any keyword arguments for the EnsembleKalmanProcess constructor, such as scheduler
source
ClimaCalibrate.get_backendFunction
get_backend()

Get ideal backend for deploying forward model runs. Each backend is found via gethostname(). Defaults to JuliaBackend if none is found.

source
ClimaCalibrate.model_runFunction
model_run(backend, iter, member, output_dir, experiment_dir; model_interface, verbose, hpc_kwargs)

Construct and execute a command to run a single forward model on a given job scheduler.

Dispatches on backend to run slurm_model_run or pbs_model_run.

Arguments:

  • iter: Iteration number
  • member: Member number
  • output_dir: Calibration experiment output directory
  • project_dir: Directory containing the experiment's Project.toml
  • model_interface: Model interface file
  • moduleloadstr: Commands which load the necessary modules
  • hpc_kwargs: Dictionary containing the resources for the job. Easily generated using kwargs.
source

Job Scheduler

ClimaCalibrate.wait_for_jobsFunction
wait_for_jobs(jobids, output_dir, iter, experiment_dir, model_interface, module_load_str, model_run_func; verbose, hpc_kwargs, reruns=1)

Wait for a set of jobs to complete. If a job fails, it will be rerun up to reruns times.

This function monitors the status of multiple jobs and handles failures by rerunning the failed jobs up to the specified number of reruns. It logs errors and job completion status, ensuring all jobs are completed before proceeding.

Arguments:

  • jobids: Vector of job IDs.
  • output_dir: Directory for output.
  • iter: Iteration number.
  • experiment_dir: Directory for the experiment.
  • model_interface: Interface to the model.
  • module_load_str: Commands to load necessary modules.
  • model_run_func: Function to run the model.
  • verbose: Print detailed logs if true.
  • hpc_kwargs: HPC job parameters.
  • reruns: Number of times to rerun failed jobs.
source
ClimaCalibrate.log_member_errorFunction
log_member_error(output_dir, iteration, member, verbose=false)

Log a warning message when an error occurs. If verbose, includes the ensemble member's output.

source
ClimaCalibrate.kill_jobFunction
kill_job(jobid::SlurmJobID)
kill_job(jobid::PBSJobID)

End a running job, catching errors in case the job can not be ended.

source
ClimaCalibrate.job_statusFunction
job_status(job_id)

Parse the slurm job_id's state and return one of three status symbols: :PENDING, :RUNNING, or :COMPLETED.

source
ClimaCalibrate.slurm_model_runFunction
slurm_model_run(iter, member, output_dir, experiment_dir, model_interface, module_load_str; hpc_kwargs)

Construct and execute a command to run a single forward model on Slurm. Helper function for model_run.

source
ClimaCalibrate.submit_slurm_jobFunction
submit_slurm_job(sbatch_filepath; env=deepcopy(ENV))

Submit a job to the Slurm scheduler using sbatch, removing unwanted environment variables.

Unset variables: "SLURMMEMPERCPU", "SLURMMEMPERGPU", "SLURMMEMPER_NODE"

source
ClimaCalibrate.pbs_model_runFunction
pbs_model_run(iter, member, output_dir, experiment_dir, model_interface, module_load_str; hpc_kwargs)

Construct and execute a command to run a single forward model on PBS Pro. Helper function for model_run.

source
ClimaCalibrate.generate_pbs_scriptFunction

generatepbsscript( iter, member, outputdir, experimentdir, modelinterface; moduleloadstr, hpckwargs, )

Generate a string containing a PBS script to run the forward model.

Returns:

  • qsub_contents::Function: A function generating the content of the PBS script based on the provided arguments. This will run the contents of the julia_script, which have to be run from a file due to Derecho's set_gpu_rank.
  • julia_script::String: The Julia script string to be executed by the PBS job.

Helper function for pbs_model_run.

source
ClimaCalibrate.submit_pbs_jobFunction
submit_pbs_job(sbatch_filepath; env=deepcopy(ENV))

Submit a job to the PBS Pro scheduler using qsub, removing unwanted environment variables.

Unset variables: "PBSMEMPERCPU", "PBSMEMPERGPU", "PBSMEMPER_NODE"

source

EnsembleKalmanProcesses Interface

ClimaCalibrate.initializeFunction
initialize(ensemble_size, observations, noise, prior, output_dir)
initialize(eki::EKP.EnsembleKalmanProcess, prior, output_dir)
initialize(config)

Initialize a calibration, saving the initial parameter ensemble to a folder within output_dir.

If no EKP struct is given, construct an EKP struct and return it.

source
ClimaCalibrate.save_G_ensembleFunction
save_G_ensemble(config::ExperimentConfig, iteration, G_ensemble)
save_G_ensemble(output_dir::AbstractString, iteration, G_ensemble)

Saves the ensemble's observation map output to the correct directory based on the provided configuration. Takes an output directory, either extracted from an ExperimentConfig or passed directly.

source
ClimaCalibrate.update_ensembleFunction
update_ensemble(output_dir::AbstractString, iteration, prior)
update_ensemble(config::ExperimentConfig, iteration)
update_ensemble(config_file::AbstractString, iteration)

Updates the EnsembleKalmanProcess object and saves the parameters for the next iteration.

source
ClimaCalibrate.update_ensemble!Function
update_ensemble!(ekp, G_ens, output_dir, iteration, prior)

Updates an EKP object with data G_ens, saving the object and final parameters to disk.

source
ClimaCalibrate.ExperimentConfigType
ExperimentConfig(
    n_iterations::Integer,
    ensemble_size::Integer,
    observations,
    noise,
    prior::ParameterDistribution,
    output_dir,
)
ExperimentConfig(filepath::AbstractString; kwargs...)

Construct an ExperimentConfig from a given YAML file or directory containing 'experiment_config.yml'.

ExperimentConfig holds the configuration for a calibration experiment. This can be constructed from a YAML configuration file or directly using individual parameters.

source
ClimaCalibrate.get_priorFunction
get_prior(param_dict::AbstractDict; names = nothing)
get_prior(prior_path::AbstractString; names = nothing)

Constructs the combined prior distribution from a param_dict or a TOML configuration file specified by prior_path. If names is provided, only those parameters are used.

source
ClimaCalibrate.get_param_dictFunction
get_param_dict(distribution; names)

Generates a dictionary for parameters based on the specified distribution, assumed to be of floating-point type. If names is not provided, the distribution's names will be used.

source
ClimaCalibrate.path_to_model_logFunction
path_to_model_log(output_dir, iteration, member)

Return the path to an ensemble member's forward model log for a given iteration and member number.

source