API

Model Interface

ClimaCalibrate.forward_modelFunction
forward_model(iteration, member)

Execute the forward model simulation with the given configuration.

This function must be overridden by a component's model interface and should set things like the parameter path and other member-specific settings.

source
ClimaCalibrate.observation_mapFunction
observation_map(iteration)

Runs the observation map for the specified iteration. This function must be implemented for each calibration experiment.

source
ClimaCalibrate.analyze_iterationFunction
analyze_iteration(ekp, g_ensemble, prior, output_dir, iteration)

After each evaluation of the observation map and before updating the ensemble, analyze_iteration is evaluated.

This function is optional to implement.

For example, one may want to print information from the eki object or plot g_ensemble.

source
ClimaCalibrate.postprocess_g_ensembleFunction
postprocess_g_ensemble(ekp, g_ensemble, prior, output_dir, iteration)

Postprocess g_ensemble after evaluating the observation map and before updating the ensemble.

source

Worker Interface

ClimaCalibrate.add_workersFunction
add_workers(
    nworkers;
    device = :gpu,
    cluster = :auto,
    time = DEFAULT_WALLTIME,
    kwargs...
)

Add nworkers worker processes to the current Julia session, automatically detecting and configuring for the available computing environment.

Arguments

  • nworkers::Int: The number of worker processes to add.
  • device::Symbol = :gpu: The target compute device type, either :gpu (1 GPU, 4 CPU cores) or :cpu (1 CPU core).
  • cluster::Symbol = :auto: The cluster management system to use. Options:
    • :auto: Auto-detect available cluster environment (SLURM, PBS, or local)
    • :slurm: Force use of SLURM scheduler
    • :pbs: Force use of PBS scheduler
    • :local: Force use of local processing (standard addprocs)
  • time::Int = DEFAULT_WALLTIME: Walltime in minutes, will be formatted appropriately for the cluster system
  • kwargs: Other kwargs can be passed directly through to addprocs.
source
ClimaCalibrate.SlurmManagerType
SlurmManager(ntasks=get(ENV, "SLURM_NTASKS", 1))

The ClusterManager for Slurm clusters, taking in the number of tasks to request with srun.

To execute the srun command, run addprocs(SlurmManager(ntasks))

Keyword arguments can be passed to srun: addprocs(SlurmManager(ntasks), gpus_per_task=1)

By default the workers will inherit the running Julia environment.

To run a calibration, call calibrate(WorkerBackend, ...)

To run functions on a worker, call remotecall(func, worker_id, args...)

source
ClimaCalibrate.PBSManagerType
PBSManager(ntasks)

The ClusterManager for PBS/Torque clusters, taking in the number of tasks to request with qsub.

To execute the qsub command, run addprocs(PBSManager(ntasks)). Unlike the SlurmManager, this will not nest scheduled jobs, but will acquire new resources.

Keyword arguments can be passed to qsub: addprocs(PBSManager(ntasks), nodes=2)

By default, the workers will inherit the running Julia environment.

To run a calibration, call calibrate(WorkerBackend, ...)

To run functions on a worker, call remotecall(func, worker_id, args...)

source

Backend Interface

ClimaCalibrate.calibrateFunction
calibrate(backend, ekp::EnsembleKalmanProcess, ensemble_size, n_iterations, prior, output_dir)
calibrate(backend, ensemble_size, n_iterations, observations, noise, prior, output_dir; ekp_kwargs...)

Run a full calibration on the given backend.

If the EKP struct is not given, it will be constructed upon initialization. While EKP keyword arguments are passed through to the EKP constructor, if using many keywords it is recommended to construct the EKP object and pass it into calibrate.

Available Backends: WorkerBackend, CaltechHPCBackend, ClimaGPUBackend, DerechoBackend, JuliaBackend

Derecho, ClimaGPU, and CaltechHPC backends are designed to run on a specific high-performance computing cluster. WorkerBackend uses Distributed.jl to run the forward model on workers.

Keyword Arguments for HPC backends

  • `model_interface: Path to the model interface file.
  • hpc_kwargs: Dictionary of resource arguments for HPC clusters, passed to the job scheduler.
  • verbose::Bool: Enable verbose logging.
  • Any keyword arguments for the EnsembleKalmanProcess constructor, such as scheduler
source
ClimaCalibrate.get_backendFunction
get_backend()

Get ideal backend for deploying forward model runs. Each backend is found via gethostname(). Defaults to JuliaBackend if none is found.

source
ClimaCalibrate.model_runFunction
model_run(backend, iter, member, output_dir, experiment_dir; model_interface, verbose, hpc_kwargs)

Construct and execute a command to run a single forward model on a given job scheduler.

Uses the given backend to run slurm_model_run or pbs_model_run.

Arguments:

  • iter: Iteration number
  • member: Member number
  • output_dir: Calibration experiment output directory
  • project_dir: Directory containing the experiment's Project.toml
  • model_interface: Model interface file
  • moduleloadstr: Commands which load the necessary modules
  • hpc_kwargs: Dictionary containing the resources for the job. Easily generated using kwargs.
source

Job Scheduler

ClimaCalibrate.wait_for_jobsFunction
wait_for_jobs(jobids, output_dir, iter, experiment_dir, model_interface, module_load_str, model_run_func; verbose, hpc_kwargs, reruns=1)

Wait for a set of jobs to complete. If a job fails, it will be rerun up to reruns times.

This function monitors the status of multiple jobs and handles failures by rerunning the failed jobs up to the specified number of reruns. It logs errors and job completion status, ensuring all jobs are completed before proceeding.

Arguments:

  • jobids: Vector of job IDs.
  • output_dir: Directory for output.
  • iter: Iteration number.
  • experiment_dir: Directory for the experiment.
  • model_interface: Interface to the model.
  • module_load_str: Commands to load necessary modules.
  • model_run_func: Function to run the model.
  • verbose: Print detailed logs if true.
  • hpc_kwargs: HPC job parameters.
  • reruns: Number of times to rerun failed jobs.
source
ClimaCalibrate.log_member_errorFunction
log_member_error(output_dir, iteration, member, verbose=false)

Log a warning message when an error occurs. If verbose, includes the ensemble member's output.

source
ClimaCalibrate.kill_jobFunction
kill_job(jobid::SlurmJobID)
kill_job(jobid::PBSJobID)

End a running job, catching errors in case the job can not be ended.

source
ClimaCalibrate.job_statusFunction
job_status(job_id)

Parse the slurm job_id's state and return one of three status symbols: :PENDING, :RUNNING, or :COMPLETED.

source
ClimaCalibrate.slurm_model_runFunction
slurm_model_run(iter, member, output_dir, experiment_dir, model_interface, module_load_str; hpc_kwargs)

Construct and execute a command to run a single forward model on Slurm. Helper function for model_run.

source
ClimaCalibrate.generate_sbatch_scriptFunction
generate_sbatch_script(iter, member, output_dir, experiment_dir, model_interface; module_load_str, hpc_kwargs, exeflags="")

Generate a string containing an sbatch script to run the forward model. hpc_kwargs is turned into a series of sbatch directives using generate_sbatch_directives. module_load_str is used to load the necessary modules and can be obtained via module_load_string. exeflags is a string of flags to pass to the Julia executable (defaults to empty string).

source
ClimaCalibrate.submit_slurm_jobFunction
submit_slurm_job(sbatch_filepath; env=deepcopy(ENV))

Submit a job to the Slurm scheduler using sbatch, removing unwanted environment variables.

Unset variables: "SLURMMEMPERCPU", "SLURMMEMPERGPU", "SLURMMEMPER_NODE"

source
ClimaCalibrate.pbs_model_runFunction
pbs_model_run(iter, member, output_dir, experiment_dir, model_interface, module_load_str; hpc_kwargs)

Construct and execute a command to run a single forward model on PBS Pro. Helper function for model_run.

source
ClimaCalibrate.generate_pbs_scriptFunction

generatepbsscript( iter, member, outputdir, experimentdir, modelinterface; moduleloadstr, hpckwargs, )

Generate a string containing a PBS script to run the forward model.

Returns:

  • qsub_contents::Function: A function generating the content of the PBS script based on the provided arguments. This will run the contents of the julia_script, which have to be run from a file due to Derecho's set_gpu_rank.
  • julia_script::String: The Julia script string to be executed by the PBS job.

Helper function for pbs_model_run.

source
ClimaCalibrate.submit_pbs_jobFunction
submit_pbs_job(sbatch_filepath; env=deepcopy(ENV))

Submit a job to the PBS Pro scheduler using qsub, removing unwanted environment variables.

Unset variables: "PBSMEMPERCPU", "PBSMEMPERGPU", "PBSMEMPER_NODE", "PYTHONHOME", "PYTHONPATH", "PYTHONUSERBASE"

source

EnsembleKalmanProcesses Interface

ClimaCalibrate.initializeFunction
initialize(eki::EKP.EnsembleKalmanProcess, prior, output_dir)
initialize(ensemble_size, observations, noise, prior, output_dir)

Initialize a calibration, saving the initial parameter ensemble to a folder within output_dir.

If no EKP struct is given, construct an EKP struct and return it.

source
ClimaCalibrate.save_G_ensembleFunction
save_G_ensemble(output_dir::AbstractString, iteration, G_ensemble)

Saves the ensemble's observation map output to the correct directory based on the provided configuration. Takes an output directory, iteration number, and the ensemble output to save.

source
ClimaCalibrate.update_ensembleFunction
update_ensemble(output_dir::AbstractString, iteration, prior)

Updates the EnsembleKalmanProcess object and saves the parameters for the next iteration.

source
ClimaCalibrate.update_ensemble!Function
update_ensemble!(ekp, G_ens, output_dir, iteration, prior)

Updates an EKP object with data G_ens, saving the object and final parameters to disk.

source
ClimaCalibrate.get_priorFunction
get_prior(param_dict::AbstractDict; names = nothing)
get_prior(prior_path::AbstractString; names = nothing)

Constructs the combined prior distribution from a param_dict or a TOML configuration file specified by prior_path. If names is provided, only those parameters are used.

source
ClimaCalibrate.get_param_dictFunction
get_param_dict(distribution; names)

Generates a dictionary for parameters based on the specified distribution, assumed to be of floating-point type. If names is not provided, the distribution's names will be used.

source
ClimaCalibrate.path_to_model_logFunction
path_to_model_log(output_dir, iteration, member)

Return the path to an ensemble member's forward model log for a given iteration and member number.

source
ClimaCalibrate.minibatcher_over_samplesFunction
minibatcher_over_samples(n_samples, batch_size)

Create a FixedMinibatcher that divides n_samples into batches of size batch_size.

If n_samples is not divisible by batch_size, the remaining samples will be dropped.

source
minibatcher_over_samples(samples, batch_size)

Create a FixedMinibatcher that divides a vector of samples into batches of size batch_size.

If the number of samples is not divisible by batch_size, the remaining samples will be dropped.

source
ClimaCalibrate.observation_series_from_samplesFunction
observation_series_from_samples(samples, batch_size, names = nothing)

Create an EKP.ObservationSeries from a vector of EKP.Observation samples.

If the number of samples is not divisible by batch_size, the remaining samples will be dropped.

source
ClimaCalibrate.load_latest_ekpFunction
load_latest_ekp(output_dir)

Return the most recent EnsembleKalmanProcess struct from the given output directory.

Returns nothing if no EKP structs are found.

source

Observation Recipe Interface

ClimaCalibrate.ObservationRecipe.AbstractCovarianceEstimatorType
abstract type AbstractCovarianceEstimator end

An object that estimates the noise covariance matrix from observational data that is appropriate for a sample between start_date and end_date.

AbstractCovarianceEstimator have to provide one function, ObservationRecipe.covariance.

The function has to have the signature

ObservationRecipe.covariance(
    covar_estimator::AbstractCovarianceEstimator,
    vars,
    start_date,
    end_date,
)

and return a noise covariance matrix.

source
ClimaCalibrate.ObservationRecipe.ScalarCovarianceMethod
ScalarCovariance(;
    scalar = 1.0,
    use_latitude_weights = false,
    min_cosd_lat = 0.1,
)

Create a ScalarCovariance which specifies how the covariance matrix should be formed. When used with ObservationRecipe.observation or ObservationRecipe.covariance, return a Diagonal matrix.

Keyword arguments

  • scalar: Scalar value to multiply the identity matrix by.

  • use_latitude_weights: If true, then latitude weighting is applied to the covariance matrix. Latitude weighting is multiplying the values along the diagonal of the covariance matrix by (1 / max(cosd(lat), min_cosd_lat)). See the keyword argument min_cosd_lat for more information.

  • min_cosd_lat: Control the minimum latitude weight when use_latitude_weights is true. The value for min_cosd_lat must be greater than zero as values close to zero along the diagonal of the covariance matrix can lead to issues when taking the inverse of the covariance matrix.

source
ClimaCalibrate.ObservationRecipe.SeasonalDiagonalCovarianceMethod
SeasonalDiagonalCovariance(model_error_scale = 0.0,
                           regularization = 0.0,
                           ignore_nan = true,
                           use_latitude_weights = false,
                           min_cosd_lat = 0.1)

Create a SeasonalDiagonalCovariance which specifies how the covariance matrix should be formed. When used with ObservationRecipe.observation or ObservationRecipe.covariance, return a Diagonal matrix.

Keyword arguments

  • model_error_scale: Noise from the model error added to the covariance matrix. This is (model_error_scale * seasonal_mean).^2, where seasonal_mean is the seasonal mean for each of the quantity for each of the season (DJF, MAM, JJA, SON).

  • regularization: A diagonal matrix of the form regularization * I is added to the covariance matrix.

  • ignore_nan: If true, then NaNs are ignored when computing the covariance matrix. Otherwise, NaN are included in the intermediate calculation of the covariance matrix. Note that all NaNs are removed in the last step of forming the covariance matrix even if ignore_nan is false.

  • use_latitude_weights: If true, then latitude weighting is applied to the covariance matrix. Latitude weighting is multiplying the values along the diagonal of the covariance matrix by (1 / max(cosd(lat), min_cosd_lat)). See the keyword argument min_cosd_lat for more information.

  • min_cosd_lat: Control the minimum latitude weight when use_latitude_weights is true. The value for min_cosd_lat must be greater than zero as values close to zero along the diagonal of the covariance matrix can lead to issues when taking the inverse of the covariance matrix.

source
ClimaCalibrate.ObservationRecipe.SVDplusDCovarianceMethod
SVDplusDCovariance(sample_date_ranges;
                   model_error_scale = 0.0,
                   regularization = 0.0,

Create a SVDplusDCovariance which specifies how the covariance matrix should be formed. When used with ObservationRecipe.observation or ObservationRecipe.covariance, return a EKP.SVDplusD covariance matrix.

Positional arguments

  • sample_date_ranges: The start and end dates of each samples. This is used to determine the sample from the time series data of the OutputVars. These dates must be present in all the OutputVars.

Keyword arguments

  • model_error_scale: Noise from the model error added to the covariance matrix. This is (model_error_scale * mean(samples, dims = 2)).^2, where mean(samples, dims = 2) is the mean of the samples.

  • regularization: A diagonal matrix of the form regularization * I is added to the covariance matrix.

  • use_latitude_weights: If true, then latitude weighting is applied to the covariance matrix. Latitude weighting is multiplying the columns of the matrix of samples by 1 / sqrt(max(cosd(lat), 0.1)). See the keyword argument min_cosd_lat for more information.

  • min_cosd_lat: Control the minimum latitude weight when use_latitude_weights is true. The value for min_cosd_lat must be greater than zero as values close to zero along the diagonal of the covariance matrix can lead to issues when taking the inverse of the covariance matrix.

source
ClimaCalibrate.ObservationRecipe.covarianceFunction
covariance(covar_estimator::ScalarCovariance,
           vars::Union{OutputVar, Iterable{OutputVar}},
           start_date,
           end_date)

Compute the scalar covariance matrix.

Data from vars will not be used to compute the covariance matrix.

source
covariance(covar_estimator::SeasonalDiagonalCovariance,
           vars::Union{OutputVar, Iterable{OutputVar}},
           start_date,
           end_date)

Compute the noise covariance matrix of seasonal quantities from var that is appropriate for a sample of seasonal quantities across time for seasons between start_date and end_date.

The diagonal is computed from the variances of the seasonal quantities.

source
covariance(covar_estimator::SVDplusDCovariance,
           vars::Union{OutputVar, Iterable{OutputVar}},
           start_date,
           end_date)

Compute the EKP.SVDplusD covariance matrix appropriate for a sample with times between start_date and end_date.

source
ClimaCalibrate.ObservationRecipe.observationFunction
observation(covar_estimator::AbstractCovarianceEstimator,
            vars,
            start_date,
            end_date;
            name = nothing)

Return an EKP.Observation with a sample between the dates start_date and end_date, a covariance matrix defined by covar_estimator, name determined from the short names of vars, and metadata.

Metadata

Metadata in EKP.observation is only added with versions of EnsembleKalmanProcesses later than v2.4.2.

source
ClimaCalibrate.ObservationRecipe.reconstruct_g_mean_finalFunction
reconstruct_g_mean_final(ekp::EKP.EnsembleKalmanProcess,
                         observation::EKP.Observation)

Reconstruct the mean forward model evaluation at the last iteration as a vector of OutputVars.

This function assumes observation contains the necessary metadata to reconstruct the OutputVars. Note that the metadata comes from the observations.

source
ClimaCalibrate.ObservationRecipe.seasonally_aligned_yearly_sample_date_rangesFunction
seasonally_aligned_yearly_sample_date_ranges(var::OutputVar)

Generate sample dates that conform to a seasonally aligned year from dates(var).

A seasonally aligned year is defined to be from December to November of the following year.

This function is useful for finding the sample dates of samples consisting of all four seasons in a single year. For example, one can use this function to find the sample_date_ranges when constructing SVDplusDCovariance.

All four seasons in a year is not guaranteed

This function does not check whether the start and end dates of each sample contain all four seasons. A sample may be missing a season, especially at the beginning or end of the time series.

source

Ensemble Builder Interface

ClimaAnalysisExt.GEnsembleBuilderType
GEnsembleBuilder{FT <: AbstractFloat}

An object to help build G ensemble matrix by using the metadata stored in the EKP.EnsembleKalmanProcess object. Metadata must come from ClimaAnalysis.

GEnsembleBuilder takes in preprocessed OutputVars and automatically construct the corresponding G ensemble matrix for the current iteration of the calibration.

source
ClimaCalibrate.EnsembleBuilder.fill_g_ens_col!Function
EnsembleBuilder.fill_g_ens_col!(g_ens_builder::GEnsembleBuilder,
                                col_idx,
                                var::OutputVar;
                                checkers = (),
                                verbose = false)

Fill the col_idxth of the G ensemble matrix from the OutputVar var and ekp. If it was successful, return true, otherwise, return false.

It is assumed that the times or dates of a single OutputVar is a superset of the times or dates of one or more metadata in the minibatch.

This function relies on the short names in the metadata. This function will not behave correctly if the short names are mislabled or not present.

Furthermore, this function assumes that all observations are generated using ObservationRecipe.Observation which guarantees that the metadata exists and the correct placement of metadata.

source
EnsembleBuilder.fill_g_ens_col!(g_ens_builder::GEnsembleBuilder,
                                col_idx,
                                val::AbstractFloat)

Fill the col_idxth column of the G ensemble matrix with val.

This returns true.

This is useful if you want to completely fill a column of a G ensemble matrix with NaNs if a simulation crashed.

source

Checker Interface

ClimaCalibrate.Checker.AbstractCheckerType
abstract type AbstractChecker end

An object that performs validation checks between the simulation data and metadata from observational data. This is used by GEnsembleBuilder to validate OutputVars from simulation data against the Metadata in the observations in the EnsembleKalmanProcess object.

An AbstractChecker must implement the Checker.check function.

The function must have the signature:

import ClimaCalibrate.Checker
Checker.check(::YourChecker,
              var::OutputVar,
              metadata::Metadata;
              data = nothing,
              verbose = false)

and return true or false.

What is var and metadata?

For more information about OutputVar and Metadata, see the ClimaAnalysis documentation.

source
ClimaCalibrate.Checker.SignCheckerType
struct SignChecker{FT <: AbstractFloat} <: AbstractChecker

A struct that checks that the proportion of positive values in the simulation data and observational data is roughly the same.

To change the default threshold of 0.05, you can pass a float to SignChecker.

import ClimaCalibrate
sign_checker = ClimaCalibrate.Checker.SignChecker(0.01)
source
ClimaCalibrate.Checker.checkFunction
check(checker::AbstractChecker,
      var,
      metadata;
      data = nothing,
      verbose = false)

Return true if the check passes, false otherwise.

If verbose=true, then provides information for why a check did not succeed.

source
Checker.check(
    ::ShortNameChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if var and metadata have the same short name, false otherwise.

source
Checker.check(
    ::DimNameChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if var and metadata have the same dimensions, false otherwise.

source
Checker.check(
    ::DimUnitsChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if the units of the dimensions in var and metadata are the same, false otherwise. This function assumes var and metadata have the same dimensions.

source
Checker.check(
    ::UnitsChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if var and metadata have the same units, false otherwise.

source
Checker.check(
    ::DimValuesMatch,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if the values of the dimensions in var and metadata are compatible for the purpose of filling out the G ensemble matrix, false otherwise.

The nontemporal dimensions are compatible if the values are approximately the same. The temporal dimensions are compatible if the temporal dimension of metadata is a subset of the temporal dimension of var.

source
Checker.check(
    ::SequentialIndicesChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if the dates of var map to sequential indices of the dates of metadata, false otherwise.

Use this check

It is recommended to always enable this check when possible.

Why use this check?

This check is helpful in ensuring that the dates are matched correctly between var and metadata. For example, without this check, if the simulation data contain monthly averages and metadata track seasonal averages, then no error is thrown, because all dates in metadata are in all the dates in var.

source
Checker.check(
    ::SignChecker,
    var::OutputVar,
    metadata::Metadata;
    data,
    verbose = false,
)

Return true if the absolute difference of the proportion of positive values in var.data and the proportion of positive values in data is less than the threshold defined in SignChecker, false otherwise.

source