API

Model Interface

ClimaCalibrate.AbstractModelInterface — Type

AbstractModelInterface

Abstract supertype for user-defined calibration experiments.

Users subtype this to define their experiment-specific configuration and dispatch the calibration interface functions.

Required interface

Subtypes must implement:

forward_model(interface, iteration, member) which runs the forward model for a single ensemble member.
observation_map(interface, iteration) which processes model output and returns a G_ensemble matrix.

To use the HPCBackend, the subtypes must also implement:

model_interface_filepath(interface) which returns the path to the file that defines the model interface. The HPCBackend job script includes this file so that all interface functions defined on the subtype are available in the worker process.

Optional interface

analyze_iteration(interface, ekp, g_ensemble, prior, iteration) which inspect results after each ensemble update. The default implementation logs the mean constrained parameters and covariance-weighted error.
postprocess_g_ensemble(interface, ekp, g_ensemble, prior, iteration) which transform g_ensemble before the ensemble update. The default implementation returns g_ensemble.

For HPCBackend, the subtypes can also implement:

experiment_dir(interface) which returns the Julia project directory passed as --project to the job script. The default implementation is to return project_dir().
exeflags(interface) which returns additional flags (e.g. --threads 4) passed to the Julia executable in the job script. The default implementation is to return the empty string.

Example

struct MyModelInterface <: ClimaCalibrate.AbstractModelInterface
    config::String
end

function ClimaCalibrate.forward_model(interface::MyModelInterface, iteration, member)
    # Run the model using interface.config
end

function ClimaCalibrate.observation_map(interface::MyModelInterface, iteration)
    # Read model outputs and return G_ensemble matrix
end

source

ClimaCalibrate.forward_model — Function

forward_model(interface::AbstractModelInterface, iteration, member)

Execute the forward model simulation with the given configuration.

This function must be overridden by the user's model interface, dispatching on their subtype of AbstractModelInterface.

source

ClimaCalibrate.observation_map — Function

observation_map(interface::AbstractModelInterface, iteration)

Runs the observation map for the specified iteration. This function must be implemented for each calibration experiment, dispatching on the user's subtype of AbstractModelInterface.

source

ClimaCalibrate.analyze_iteration — Function

analyze_iteration(interface::AbstractModelInterface, ekp, g_ensemble, prior, iteration)

After updating the ensemble and before starting the next iteration, analyze_iteration is evaluated.

This function is optional to implement.

For example, one may want to print information from the eki object or plot g_ensemble.

source

ClimaCalibrate.postprocess_g_ensemble — Function

postprocess_g_ensemble(
    interface::AbstractModelInterface,
    ekp,
    g_ensemble,
    prior,
    iteration
)

Postprocess g_ensemble after evaluating the observation map and before updating the ensemble.

source

ClimaCalibrate.model_interface_filepath — Function

model_interface_filepath(interface::AbstractModelInterface)

Return the path to the file that defines the model interface.

The HPCBackend job script includes this file so that all interface functions defined on the AbstractModelInterface subtype (e.g. forward_model, observation_map, and any optional overrides) are available in the worker process, along with their required packages.

source

ClimaCalibrate.experiment_dir — Function

experiment_dir(interface::AbstractModelInterface)

Return the path to the experiment's Julia project directory.

The HPCBackend uses this to construct the job script command:

julia --project=$experiment_dir $exeflags -e '...'

so that each ensemble member's forward model job runs with the correct project environment. By default, returns project_dir() (the currently active project).

You should override this in your AbstractModelInterface subtype if your experiment lives in a separate project directory.

source

ClimaCalibrate.exeflags — Function

exeflags(::AbstractModelInterface)

Return additional flags passed to the Julia executable in the HPCBackend job script.

The HPCBackend constructs each ensemble member's job command as:

julia --project=$experiment_dir $exeflags -e '...'

Override this in your AbstractModelInterface subtype to pass extra flags such as --threads or -O0. By default, returns "" (no extra flags).

source

Calibration Interface

ClimaCalibrate.Calibration.calibrate — Function

calibrate(
    backend::HPCBackend,
    ekp::EKP.EnsembleKalmanProcess,
    interface::AbstractModelInterface,
    n_iterations,
    prior,
    output_dir,
)

Run a full calibration with ekp and prior for n_iterations on the given backend, storing the results of the calibration in output_dir.

The work of each ensemble member which is running the forward model is done by submitting a job to the backend. The file path returned by ClimaCalibrate.model_interface_filepath and project directory experiment_dir should contain all the dependencies to run the forward model. The job begins by running julia --project=$experiment_dir -e 'include($model_interface_filepath())' and running the forward model.

The interface is serialized to interface.jld2 in output_dir, so that HPC job scripts can load and pass it to forward_model.

For more information about the HPCBackend, see HPCBackend.

source

calibrate(
    backend::WorkerBackend,
    ekp::EKP.EnsembleKalmanProcess,
    interface::AbstractModelInterface,
    n_iterations,
    prior,
    output_dir,
)

Run a full calibration with ekp and prior for n_iterations on the given backend, storing the results of the calibration in output_dir.

For more information about the WorkerBackend, see WorkerBackend.

source

calibrate(
    backend::JuliaBackend,
    ekp::EKP.EnsembleKalmanProcess,
    interface::AbstractModelInterface,
    n_iterations,
    prior,
    output_dir,
)

Run a full calibration with ekp and prior for n_iterations on the given backend, storing the results of the calibration in output_dir.

Calibration with the JuliaBackend does not support restarts.

For more information about the JuliaBackend, see JuliaBackend.

source

Config Interface

ClimaCalibrate.Backend.AbstractHPCConfig — Type

abstract type AbstractHPCConfig end

An abstract type for high-performance computing job configuration objects used by HPCBackends when creating job scripts.

Interface

All subtypes of AbstractHPCConfig must have the following fields:

directives::OrderedDict{Symbol, Any}: Scheduler directives (e.g., resource requests, time limits, etc.).
modules::Vector{String}: List of modules to load in the job environment.
env_vars::OrderedDict{String, Any}: Environment variables to set for the job environment.

Subtypes must also provide the methods:

generate_directives(config): Return a string of scheduler directives for the job script.
generate_modules(config): Return a string of module load commands for the job script.
generate_env_vars(config): Return a string of environment variable export commands for the job script.

source

ClimaCalibrate.Backend.SlurmConfig — Type

SlurmConfig <: AbstractHPCConfig

A configuration holding Slurm directives, modules, and environment variables that will be used when creating a job scripts by the SlurmBackends.

source

ClimaCalibrate.Backend.SlurmConfig — Method

SlurmConfig(;
    directives = Pair{Symbol, Any}[],
    modules = String[],
    env_vars = Pair{String, Any}[],
)

Create a SlurmConfig specifying the directives, modules, and env_vars for SlurmBackends.

Defaults

The default directive is

:gpus_per_task: 0.

The default environment variables are

CLIMACOMMS_DEVICE: "CPU" or "GPU" depending on the job directives,
CLIMACOMMS_CONTEXT: "MPI".

Examples

This example creates a Slurm configuration for a job with a single task, using 12 CPUs and 1 GPU, and a runtime of 720 minutes. It loads the latest version of climacommon and explicitly sets environment variables for ClimaComms.

ClimaCalibrate.SlurmConfig(;
    directives = [
        :ntasks => 1,
        :gpus_per_task => 1,
        :cpus_per_task => 12,
        :time => 720,
    ],
    modules = ["climacommon"],
    env_vars = [
        "CLIMACOMMS_CONTEXT" => "SINGLETON",
        "CLIMACOMMS_DEVICE" => "CUDA",
    ],
)

source

ClimaCalibrate.Backend.PBSConfig — Type

PBSConfig <: AbstractHPCConfig

A configuration holding PBS directives, modules, and environment variables that will be used when creating a job scripts by the DerechoBackend.

source

ClimaCalibrate.Backend.PBSConfig — Method

PBSConfig(;
    directives = Pair{Symbol, Any}[],
    modules = String[],
    env_vars = Pair{String, Any}[],
)

Create a PBSConfig specifying the directives, modules, and env_vars for the DerechoBackend.

The supported directives are: time, queue, ntasks, cpus_per_task, gpus_per_task, and job_priority. These directive names follow the Slurm naming convention (e.g., time instead of walltime). Any other directives provided will be ignored.

Defaults

The default directives are

queue: "main",
ntasks: 1,
cpus_per_task: 1,
gpus_per_task: 0,
job_priority: "regular".

The default environment variables are

CLIMACOMMS_DEVICE: "CPU" or "GPU" depending on the job directives,
CLIMACOMMS_CONTEXT: "MPI".

Examples

This example creates a PBS configuration for a job with a single task, using 12 CPUs and 1 GPU, and a runtime of 720 minutes. It loads the latest version of climacommon and explicitly sets environment variables for ClimaComms.

ClimaCalibrate.PBSConfig(;
    directives = [
        :ntasks => 1,
        :gpus_per_task => 1,
        :cpus_per_task => 12,
        :time => 720,
    ],
    modules = ["climacommon"],
    env_vars = [
        "CLIMACOMMS_CONTEXT" => "SINGLETON",
        "CLIMACOMMS_DEVICE" => "CUDA",
    ],
)

source

Backend Interface

ClimaCalibrate.Backend.JuliaBackend — Type

JuliaBackend

The simplest backend to use.

This is a singleton type and is meant for use in dispatch.

source

ClimaCalibrate.Backend.HPCBackend — Type

HPCBackend <: AbstractBackend

An abstract type for high performance cluster backends.

source

ClimaCalibrate.Backend.DerechoBackend — Type

DerechoBackend

Used for NSF NCAR's Derecho supercomputing system.

source

ClimaCalibrate.Backend.DerechoBackend — Method

DerechoBackend(config::PBSConfig)

Construct a DerechoBackend for submitting jobs to the Derecho supercomputing system.

See PBSConfig.

source

ClimaCalibrate.Backend.DerechoBackend — Method

DerechoBackend(; directives, modules, env_vars)

Construct a PBSConfig from the keyword arguments and use it to construct a DerechoBackend.

source

ClimaCalibrate.Backend.CaltechHPCBackend — Type

CaltechHPCBackend

Used for Caltech's high-performance computing cluster.

source

ClimaCalibrate.Backend.CaltechHPCBackend — Method

CaltechHPCBackend(config::SlurmConfig)

Construct a CaltechHPCBackend for submitting jobs to Caltech's high-performance computing cluster.

See SlurmConfig.

source

ClimaCalibrate.Backend.CaltechHPCBackend — Method

CaltechHPCBackend(; directives, modules, env_vars)

Construct a SlurmConfig from the keyword arguments and use it to construct a CaltechHPCBackend.

source

ClimaCalibrate.Backend.ClimaGPUBackend — Type

ClimaGPUBackend

Used for CliMA's private GPU server.

source

ClimaCalibrate.Backend.ClimaGPUBackend — Method

ClimaGPUBackend(config::SlurmConfig)

Construct a ClimaGPUBackend for submitting jobs to CliMA's private GPU server.

See SlurmConfig.

source

ClimaCalibrate.Backend.ClimaGPUBackend — Method

ClimaGPUBackend(; directives, modules, env_vars)

Construct a SlurmConfig from the keyword arguments and use it to construct a ClimaGPUBackend.

source

ClimaCalibrate.Backend.GCPBackend — Type

GCPBackend

Used for CliMA's private GCP server.

source

ClimaCalibrate.Backend.GCPBackend — Method

GCPBackend(config::SlurmConfig)

Construct a GCPBackend for submitting jobs to CliMA's private GCP server.

See SlurmConfig.

source

ClimaCalibrate.Backend.GCPBackend — Method

GCPBackend(; directives, modules, env_vars)

Construct a SlurmConfig from the keyword arguments and use it to construct a GCPBackend.

source

ClimaCalibrate.Backend.WorkerBackend — Type

WorkerBackend

Used to run calibrations on Distributed.jl's workers. For use on a Slurm cluster, see SlurmManager and for use on a PBS cluster, see PBSManager.

Keyword Arguments for WorkerBackend

failure_rate::Float64: The threshold for the percentage of workers that can fail before an iteration is stopped. The default is 0.5.
worker_pool: A worker pool created from the workers available.

source

ClimaCalibrate.Backend.get_backend — Function

get_backend()

Get the ideal backend for running work and jobs.

Each backend is found via gethostname(). Defaults to JuliaBackend if none is found.

source

Worker Interface

ClimaCalibrate.Backend.SlurmManager — Type

SlurmManager(ntasks=get(ENV, "SLURM_NTASKS", 1))

The ClusterManager for Slurm clusters, taking in the number of tasks to request with srun.

To execute the srun command, run addprocs(SlurmManager(ntasks)).

Keyword arguments can be passed to srun: addprocs(SlurmManager(ntasks), gpus_per_task=1).

By default the workers will inherit the running Julia environment.

To run a calibration, call calibrate(WorkerBackend(), ...).

To run functions on a worker, call remotecall(func, worker_id, args...).

source

ClimaCalibrate.Backend.PBSManager — Type

PBSManager(ntasks)

The ClusterManager for PBS/Torque clusters, taking in the number of tasks to request with qsub.

To execute the qsub command, run addprocs(PBSManager(ntasks)). Unlike the SlurmManager, this will not nest scheduled jobs, but will acquire new resources.

Keyword arguments can be passed to qsub: addprocs(PBSManager(ntasks), nodes=2)

By default, the workers will inherit the running Julia environment.

To run a calibration, call calibrate(WorkerBackend(), ...)

To run functions on a worker, call remotecall(func, worker_id, args...)

source

ClimaCalibrate.Backend.add_workers — Function

add_workers(
    nworkers;
    device = :gpu,
    cluster = :auto,
    time = DEFAULT_WALLTIME,
    kwargs...
)

Add nworkers worker processes to the current Julia session, automatically detecting and configuring for the available computing environment.

Arguments

nworkers::Int: The number of worker processes to add.
device::Symbol = :gpu: The target compute device type, either :gpu (1 GPU, 4 CPU cores) or :cpu (1 CPU core).
cluster::Symbol = :auto: The cluster management system to use. Options:
- :auto: Auto-detect available cluster environment (SLURM, PBS, or local)
- :slurm: Force use of SLURM scheduler
- :pbs: Force use of PBS scheduler
- :local: Force use of local processing (standard addprocs)
time::Int = DEFAULT_WALLTIME: Walltime in minutes, will be formatted appropriately for the cluster system
kwargs: Other kwargs can be passed directly through to addprocs.

source

ClimaCalibrate.Backend.set_worker_logger — Function

set_worker_logger()

Loads Logging and sets the global logger to log to worker_$worker_id.log. This function should be called from the worker process.

source

ClimaCalibrate.Backend.set_worker_loggers — Function

set_worker_loggers(workers = workers())

Set the global logger to a simple file logger for the given workers.

source

ClimaCalibrate.Backend.map_remotecall_fetch — Function

map_remotecall_fetch(f::Function, args...; workers = workers())

Call function f from each worker and wait for the results to return.

source

ClimaCalibrate.Backend.foreach_remotecall_wait — Function

foreach_remotecall_wait(f::Function, args...; workers = workers())

Call function f from each worker.

source

Cluster Management Interface

ClimaCalibrate.Backend.JobInfo — Type

JobInfo

A struct containing the backend, job ID, and the job script that was run.

source

ClimaCalibrate.Backend.JobStatus — Type

JobStatus

An enum representing the current status of a job.

Values

PENDING: The job is queued and waiting to be scheduled.
RUNNING: The job is currently executing.
COMPLETED: The job finished running.
FAILED: The job terminated with an error as reported by the scheduler.

Use ispending, isrunning, issuccess, isfailed, and iscompleted to query the status of a JobInfo.

EnsembleKalmanProcesses Interface

ClimaCalibrate.Calibration.initialize — Function

initialize(eki::EKP.EnsembleKalmanProcess, prior, output_dir)

Initialize a calibration, saving the initial parameter ensemble to a folder within output_dir.

source

ClimaCalibrate.Calibration.last_completed_iteration — Function

last_completed_iteration(output_dir)

Determines the last completed iteration given an output_dir containing a calibration run.

If no iteration has been completed yet, return 0.

source

ClimaCalibrate.Calibration.save_G_ensemble — Function

save_G_ensemble(output_dir::AbstractString, iteration, G_ensemble)

Saves the ensemble's observation map output to the correct directory based on the provided configuration. Takes an output directory, iteration number, and the ensemble output to save.

source

ClimaCalibrate.Calibration.update_ensemble — Function

update_ensemble(output_dir::AbstractString, iteration, prior)

Updates the EnsembleKalmanProcess object and saves the parameters for the next iteration.

source

ClimaCalibrate.Calibration.update_ensemble! — Function

update_ensemble!(ekp, G_ens, output_dir, iteration, prior)

Updates an EKP object with data G_ens, saving the object and final parameters to disk.

source

ClimaCalibrate.Calibration.observation_map_and_update! — Function

observation_map_and_update!(
    ekp,
    output_dir,
    iteration,
    prior,
    interface,
)

Compute the observation map and update the given EKP object.

source

ClimaCalibrate.Calibration.get_prior — Function

get_prior(param_dict::AbstractDict; names = nothing)
get_prior(prior_path::AbstractString; names = nothing)

Constructs the combined prior distribution from a param_dict or a TOML configuration file specified by prior_path. If names is provided, only those parameters are used.

source

ClimaCalibrate.Calibration.get_param_dict — Function

get_param_dict(distribution; names)

Generates a dictionary for parameters based on the specified distribution, assumed to be of floating-point type. If names is not provided, the distribution's names will be used.

source

ClimaCalibrate.Calibration.path_to_iteration — Function

path_to_iteration(output_dir, iteration)

Return the path to the directory for a given iteration within the specified output directory.

source

ClimaCalibrate.Calibration.path_to_ensemble_member — Function

path_to_ensemble_member(output_dir, iteration, member)

Return the path to an ensemble member's directory for a given iteration and member number.

source

ClimaCalibrate.Calibration.path_to_model_log — Function

path_to_model_log(output_dir, iteration, member)

Return the path to an ensemble member's forward model log for a given iteration and member number.

source

ClimaCalibrate.Calibration.parameter_path — Function

parameter_path(output_dir, iteration, member)

Return the path to an ensemble member's parameter file.

source

ClimaCalibrate.Calibration.checkpoint_path — Function

checkpoint_path(output_dir, iteration, member)

Return the path to an ensemble member's checkpoint file.

source

ClimaCalibrate.Calibration.load_latest_ekp — Function

load_latest_ekp(output_dir)

Return the most recent EnsembleKalmanProcess struct from the given output directory.

Returns nothing if no EKP structs are found.

source

ClimaCalibrate.Calibration.load_ekp_struct — Function

load_ekp_struct(output_dir, iteration)

Return the EnsembleKalmanProcess struct for a completed iteration.

source

ClimaCalibrate.Calibration.ekp_path — Function

ekp_path(output_dir, iteration)

Return the path to the serialized EnsembleKalmanProcess struct file for a given iteration.

source

ClimaCalibrate.Calibration.save_eki_and_parameters — Function

save_eki_and_parameters(eki, output_dir, iteration, prior)

Save EKI state and parameters. Helper function for initialize and update_ensemble

source

EKP Utilities

ClimaCalibrate.EKPUtils.minibatcher_over_samples — Function

minibatcher_over_samples(n_samples, batch_size)

Create a FixedMinibatcher that divides n_samples into batches of size batch_size.

If n_samples is not divisible by batch_size, the remaining samples will be dropped.

source

minibatcher_over_samples(samples, batch_size)

Create a FixedMinibatcher that divides a vector of samples into batches of size batch_size.

If the number of samples is not divisible by batch_size, the remaining samples will be dropped.

source

ClimaCalibrate.EKPUtils.observation_series_from_samples — Function

observation_series_from_samples(samples, batch_size, names = nothing)

Create an EKP.ObservationSeries from a vector of EKP.Observation samples.

If the number of samples is not divisible by batch_size, the remaining samples will be dropped.

source

ClimaCalibrate.EKPUtils.get_observations_for_nth_iteration — Function

get_observations_for_nth_iteration(obs_series::EKP.ObservationSeries, N)

For the Nth iteration, return a vector of the observation(s) being processed.

source

ClimaCalibrate.EKPUtils.get_metadata_for_nth_iteration — Function

get_metadata_for_nth_iteration(obs_series::EKP.ObservationSeries, N)

For the Nth iteration, return a vector of the metadata of the observation(s) being processed.

source

ClimaCalibrate.EKPUtils.g_ens_matrix — Function

g_ens_matrix(eki::EKP.EnsembleKalmanProcess{FT}) where {FT <: AbstractFloat}

Construct an G ensemble matrix of type FT with all NaNs for the current iteration.

source

Observation Recipe Interface

ClimaCalibrate.ObservationRecipe.AbstractCovarianceEstimator — Type

abstract type AbstractCovarianceEstimator end

An object that estimates the noise covariance matrix from observational data that is appropriate for a sample between start_date and end_date.

AbstractCovarianceEstimator have to provide one function, ObservationRecipe.covariance.

The function has to have the signature

ObservationRecipe.covariance(
    covar_estimator::AbstractCovarianceEstimator,
    vars,
    start_date,
    end_date,
)

and return a noise covariance matrix.

source

ClimaCalibrate.ObservationRecipe.ScalarCovariance — Type

ScalarCovariance <: AbstractCovarianceEstimator

Contain the necessary information to construct the scalar covariance matrix.

source

ClimaCalibrate.ObservationRecipe.ScalarCovariance — Method

ScalarCovariance(;
    scalar = 1.0,
    use_latitude_weights = false,
    min_cosd_lat = 0.1,
)

Create a ScalarCovariance which specifies how the covariance matrix should be formed. When used with ObservationRecipe.observation or ObservationRecipe.covariance, return a Diagonal matrix.

Keyword arguments

scalar: Scalar value to multiply the identity matrix by.
use_latitude_weights: If true, then latitude weighting is applied to the covariance matrix. Latitude weighting is multiplying the values along the diagonal of the covariance matrix by (1 / max(cosd(lat), min_cosd_lat)). See the keyword argument min_cosd_lat for more information.
min_cosd_lat: Control the minimum latitude weight when use_latitude_weights is true. The value for min_cosd_lat must be greater than zero as values close to zero along the diagonal of the covariance matrix can lead to issues when taking the inverse of the covariance matrix.

source

ClimaCalibrate.ObservationRecipe.SeasonalDiagonalCovariance — Type

SeasonalDiagonalCovariance <: AbstractCovarianceEstimator

Contain the necessary information to construct a diagonal covariance matrix whose entries represents seasonal covariances from ClimaAnalysis.OutputVars.

source

ClimaCalibrate.ObservationRecipe.SeasonalDiagonalCovariance — Method

SeasonalDiagonalCovariance(;
    model_error_scale = 0.0,
    regularization = 0.0,
    ignore_nan = true,
    use_latitude_weights = false,
    min_cosd_lat = 0.1,
)

Create a SeasonalDiagonalCovariance which specifies how the covariance matrix should be formed. When used with ObservationRecipe.observation or ObservationRecipe.covariance, return a Diagonal matrix.

Keyword arguments

model_error_scale: Noise from the model error added to the covariance matrix. This is (model_error_scale * seasonal_mean).^2, where seasonal_mean is the seasonal mean for each of the quantity for each of the season (DJF, MAM, JJA, SON).
regularization: A diagonal matrix of the form regularization * I is added to the covariance matrix.
ignore_nan: If true, then NaNs are ignored when computing the covariance matrix. Otherwise, NaN are included in the intermediate calculation of the covariance matrix. Note that all NaNs are removed in the last step of forming the covariance matrix even if ignore_nan is false.
use_latitude_weights: If true, then latitude weighting is applied to the covariance matrix. Latitude weighting is multiplying the values along the diagonal of the covariance matrix by (1 / max(cosd(lat), min_cosd_lat)). See the keyword argument min_cosd_lat for more information.
min_cosd_lat: Control the minimum latitude weight when use_latitude_weights is true. The value for min_cosd_lat must be greater than zero as values close to zero along the diagonal of the covariance matrix can lead to issues when taking the inverse of the covariance matrix.

source

ClimaCalibrate.ObservationRecipe.SVDplusDCovariance — Type

SVDplusDCovariance <: AbstractCovarianceEstimator

Contain the necessary information to construct a EKP.SVDplusD covariance matrix from ClimaAnalysis.OutputVars.

source

ClimaCalibrate.ObservationRecipe.SVDplusDCovariance — Method

SVDplusDCovariance(
    sample_date_ranges;
    model_error_scale = 0.0,
    regularization = 0.0,
    use_latitude_weights = false,
    min_cosd_lat = 0.1,
    rank = nothing
)

Create a SVDplusDCovariance which specifies how the covariance matrix should be formed. When used with ObservationRecipe.observation or ObservationRecipe.covariance, return a EKP.SVDplusD covariance matrix.

Recommended sample size

For sample_date_ranges, it is recommended that each sample contains data from a single year. For example, if the samples are created from time series data of seasonal averages, then each sample should contain all four seasons. Otherwise, the covariance matrix may not make sense. For example, if each sample contains two years of seasonally averaged data, then the sample mean is the seasonal mean of every other season across the years stacked vertically. For a concrete example, if the sample contain DJF for both 2010 and 2011. Then, the sample mean will be of mean of DJF 2010, 2012, and so on, and the mean of DJF 2011, 2013, and so on. As a result, if one were to use this covariance matrix with model_error_scale, the covariance matrix will not make sense.

Positional arguments

sample_date_ranges: The start and end dates of each samples. This is used to determine the sample from the time series data of the OutputVars. These dates must be present in all the OutputVars.

Keyword arguments

model_error_scale: Noise from the model error added to the covariance matrix. This is (model_error_scale * mean(samples, dims = 2)).^2, where mean(samples, dims = 2) is the mean of the samples.
regularization: If a scalar is used, a diagonal matrix of the form regularization * I is added to the covariance matrix. See QuantileRegularization for another option for regularization.
use_latitude_weights: If true, then latitude weighting is applied to the covariance matrix. Latitude weighting is multiplying the columns of the matrix of samples by 1 / sqrt(max(cosd(lat), 0.1)). See the keyword argument min_cosd_lat for more information.
min_cosd_lat: Control the minimum latitude weight when use_latitude_weights is true. The value for min_cosd_lat must be greater than zero as values close to zero along the diagonal of the covariance matrix can lead to issues when taking the inverse of the covariance matrix.
rank: Rank of the singlar value decomposition (SVD). If nothing is passed in, then the rank is automatically inferred from the data.

source

ClimaCalibrate.ObservationRecipe.QuantileRegularization — Type

QuantileRegularization

Regularization using the quantile of the model error scale for each OutputVar.

The same quantile is used for each OutputVar when making the observation.

This is used for the SVDplusDCovariance matrix.

Examples

In the example below, a regularization using the 0.05 quantile of the model error scale for each variable is initialized.

qtl_regularization = QuantileRegularization(0.05)

source

ClimaCalibrate.ObservationRecipe.covariance — Function

covariance(
    covar_estimator::ScalarCovariance,
    vars::Union{OutputVar, Iterable{OutputVar}},
    start_date,
    end_date
)

Compute the scalar covariance matrix.

Data from vars will not be used to compute the covariance matrix.

source

covariance(
    covar_estimator::SeasonalDiagonalCovariance,
    vars::Union{OutputVar, Iterable{OutputVar}},
    start_date,
    end_date
)

Compute the noise covariance matrix of seasonal quantities from var that is appropriate for a sample of seasonal quantities across time for seasons between start_date and end_date.

The diagonal is computed from the variances of the seasonal quantities.

source

covariance(
    covar_estimator::SVDplusDCovariance,
    vars::Union{OutputVar, Iterable{OutputVar}},
    start_date,
    end_date
)

Compute the EKP.SVDplusD covariance matrix appropriate for a sample with times between start_date and end_date.

source

ClimaCalibrate.ObservationRecipe.observation — Function

observation(
    covar_estimator::AbstractCovarianceEstimator,
    vars,
    start_date,
    end_date;
    name = nothing
)

Return an EKP.Observation with a sample between the dates start_date and end_date, a covariance matrix defined by covar_estimator, name determined from the short names of vars, and metadata.

Metadata

Metadata in EKP.observation is only added with versions of EnsembleKalmanProcesses later than v2.4.2.

source

ClimaCalibrate.ObservationRecipe.short_names — Function

short_names(obs::EKP.Observation)

Get the short names of the variables from the metadata in the EKP.Observation.

If the short name is not available, then nothing is returned instead.

source

ClimaCalibrate.ObservationRecipe.reconstruct_g_mean_final — Function

reconstruct_g_mean_final(ekp::EKP.EnsembleKalmanProcess)

Reconstruct the mean forward model evaluation at the last iteration as a vector of OutputVars.

source

ClimaCalibrate.ObservationRecipe.reconstruct_diag_cov — Function

reconstruct_diag_cov(obs::EKP.Observation)

Reconstruct the diagonal of the covariance matrix in obs as a vector of OutputVars.

This function only supports observations that contain diagonal covariance matrices.

source

ClimaCalibrate.ObservationRecipe.reconstruct_vars — Function

reconstruct_vars(obs::EKP.Observation)

Reconstruct the OutputVars from the samples in obs.

source

ClimaCalibrate.ObservationRecipe.seasonally_aligned_yearly_sample_date_ranges — Function

seasonally_aligned_yearly_sample_date_ranges(var::OutputVar)

Generate sample dates that conform to a seasonally aligned year from dates(var).

A seasonally aligned year is defined to be from December to November of the following year.

This function is useful for finding the sample dates of samples consisting of all four seasons in a single year. For example, one can use this function to find the sample_date_ranges when constructing SVDplusDCovariance.

All four seasons in a year is not guaranteed

This function does not check whether the start and end dates of each sample contain all four seasons. A sample may be missing a season, especially at the beginning or end of the time series.

source

ClimaCalibrate.ObservationRecipe.change_data_type — Function

change_data_type(var::OutputVar, data_type)

Return a OutputVar with data of type data_type.

This is useful if you want to make covariance matrix whose element type is data_type.

source

SVD Residual Analysis

ClimaCalibrate.analyze_residual — Function

analyze_residual(ekp, iter; n_eigenvectors = 3)

Analyze the model-data residual (y - G(u)) at iteration iter of an EKP calibration using the top eigenvectors of the noise covariance.

The noise covariance is obtained via EKP.get_obs_noise_cov with build = false, so it works with any StructuredMatrix type supported by EKP (SVD, Diagonal, SVDplusD), not only SVDplusD.

Returns a named tuple with:

normalized_projections: (n_eigenvectors × n_variables) matrix of z-scores per variable (values >> 1 indicate mismatch beyond noise)
structured_energy: normalized whitened energy across all variables (≈ 1 under noise model)
structured_energy_by_variable: per-variable whitened energy
residual_norm_by_variable: norm(diff[rᵥ]) for each variable
metadata: vector of ClimaAnalysis.Var.Metadata for each variable, in the same order as the columns of normalized_projections and elements of structured_energy_by_variable and residual_norm_by_variable

Requires ClimaAnalysis to be loaded.

source

ClimaCalibrate.compute_structured_energy — Function

compute_structured_energy(projections)

Given the matrix of normalized projections from compute_normalized_projections, compute the total structured energy in the whitened space:

energy = (1/n_eig) * ∑ᵢ zᵢ²,   where zᵢ = ∑ᵥ projections[i, v] = aᵢ / √λᵢ

zᵢ is the global whitened projection onto eigenvector i. Under the noise model, zᵢ ~ N(0, 1), so energy ≈ 1 is consistent with noise. Values >> 1 indicate mismatch beyond what the structured noise explains. Values << 1 suggest overfitting to noise or an overestimated noise covariance.

source

ClimaCalibrate.compute_structured_energy_by_variable — Function

compute_structured_energy_by_variable(projections)

Given the matrix of normalized projections from compute_normalized_projections, compute the per-variable structured energy in the whitened space:

energy_v = (1/n_eig) * ∑ᵢ projections[i, v]²

Returns a vector of length n_variables. Values >> 1 for variable v indicate that variable's contribution to the eigenvector projections exceeds noise-model predictions. Values ≈ 1 are consistent with noise, and values << 1 suggest overfitting or an overestimated noise covariance for that variable.

Ensemble Builder Interface

ClimaCalibrateClimaAnalysisExt.GEnsembleBuilder — Type

GEnsembleBuilder{FT <: AbstractFloat}

An object to help build G ensemble matrix by using the metadata stored in the EKP.EnsembleKalmanProcess object. Metadata must come from ClimaAnalysis.

GEnsembleBuilder takes in preprocessed OutputVars and automatically construct the corresponding G ensemble matrix for the current iteration of the calibration.

source

ClimaCalibrate.EnsembleBuilder.GEnsembleBuilder — Function

GEnsembleBuilder(ekp::EKP.EnsembleKalmanProcess{FT})
    where {FT <: AbstractFloat}

Construct a GEnsembleBuilder where the element type of the G ensemble matrix is FT.

source

ClimaCalibrate.EnsembleBuilder.fill_g_ens_col! — Function

EnsembleBuilder.fill_g_ens_col!(
    g_ens_builder::GEnsembleBuilder,
    col_idx,
    var::OutputVar;
    checkers = (),
    verbose = false
)

Fill the col_idxth of the G ensemble matrix from the OutputVar var and ekp. If it was successful, return true, otherwise, return false.

It is assumed that the times or dates of a single OutputVar is a superset of the times or dates of one or more metadata in the minibatch.

This function relies on the short names in the metadata. This function will not behave correctly if the short names are mislabled or not present.

Furthermore, this function assumes that all observations are generated using ObservationRecipe.Observation which guarantees that the metadata exists and the correct placement of metadata.

source

EnsembleBuilder.fill_g_ens_col!(
    g_ens_builder::GEnsembleBuilder,
    col_idx,
    val::AbstractFloat
)

Fill the col_idxth column of the G ensemble matrix with val.

This returns true.

This is useful if you want to completely fill a column of a G ensemble matrix with NaNs if a simulation crashed.

source

ClimaCalibrate.EnsembleBuilder.is_complete — Function

EnsembleBuilder.is_complete(g_ens_builder::GEnsembleBuilder)

Return true if all the entries of the G ensemble matrix is filled out and false otherwise.

source

ClimaCalibrate.EnsembleBuilder.get_g_ensemble — Function

EnsembleBuilder.get_g_ensemble(g_ens_builder::GEnsembleBuilder)

Return the G ensemble matrix from g_ens_builder.

If the G ensemble matrix is not completed, then a warning is thrown. See ClimaCalibrate.EnsembleBuilder.is_complete to check if the G ensemble matrix is completely filled out.

source

ClimaCalibrate.EnsembleBuilder.ranges_by_short_name — Function

ranges_by_short_name(g_ens_builder::GEnsembleBuilder, short_name)

Return a vector of ranges for the G ensemble matrix that correspond with the short name.

source

ClimaCalibrate.EnsembleBuilder.metadata_by_short_name — Function

metadata_by_short_name(g_ens_builder::GEnsembleBuilder, short_name)

Return a vector of metadata that correspond with short_name.

source

ClimaCalibrate.EnsembleBuilder.missing_short_names — Function

missing_short_names(g_ens_builder::GEnsembleBuilder, col_idx)

Return a set of the short names of the metadata that are not filled out for the col_idxth column of g_ens_builder.

source

Checker Interface

ClimaCalibrate.Checker.AbstractChecker — Type

abstract type AbstractChecker end

An object that performs validation checks between the simulation data and metadata from observational data. This is used by GEnsembleBuilder to validate OutputVars from simulation data against the Metadata in the observations in the EnsembleKalmanProcess object.

An AbstractChecker must implement the Checker.check function.

The function must have the signature:

import ClimaCalibrate.Checker
Checker.check(::YourChecker,
              var::OutputVar,
              metadata::Metadata;
              data = nothing,
              verbose = false)

and return true or false.

What is var and metadata?

For more information about OutputVar and Metadata, see the ClimaAnalysis documentation.

source

ClimaCalibrate.Checker.ShortNameChecker — Type

struct ShortNameChecker <: AbstractChecker end

A struct that checks the short name between simulation data and metadata.

source

ClimaCalibrate.Checker.DimNameChecker — Type

struct DimNameChecker <: AbstractChecker end

A struct that checks the dimension names between simulation data and metadata.

source

ClimaCalibrate.Checker.DimUnitsChecker — Type

struct DimUnitsChecker <: AbstractChecker end

A struct that checks the units of the dimensions between simulation data and metadata.

source

ClimaCalibrate.Checker.UnitsChecker — Type

struct UnitsChecker <: AbstractChecker end

A struct that checks the units between the simulation data and metadata.

source

ClimaCalibrate.Checker.DimValuesChecker — Type

struct DimValuesChecker <: AbstractChecker end

A struct that checks the values of the dimensions between the simulation data and metadata.

source

ClimaCalibrate.Checker.SequentialIndicesChecker — Type

struct SequentialIndicesChecker <: AbstractChecker end

A struct that checks that the indices of the dates of the simulation data corresponding to the dates of the metadata is sequential.

source

ClimaCalibrate.Checker.SignChecker — Type

struct SignChecker{FT <: AbstractFloat} <: AbstractChecker

A struct that checks that the proportion of positive values in the simulation data and observational data is roughly the same.

To change the default threshold of 0.05, you can pass a float to SignChecker.

import ClimaCalibrate
sign_checker = ClimaCalibrate.Checker.SignChecker(0.01)

source

ClimaCalibrate.Checker.check — Function

check(checker::AbstractChecker,
      var,
      metadata;
      data = nothing,
      verbose = false)

Return true if the check passes, false otherwise.

If verbose=true, then provides information for why a check did not succeed.

source

Checker.check(
    ::ShortNameChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if var and metadata have the same short name, false otherwise.

source

Checker.check(
    ::DimNameChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if var and metadata have the same dimensions, false otherwise.

source

Checker.check(
    ::DimUnitsChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if the units of the dimensions in var and metadata are the same, false otherwise. This function assumes var and metadata have the same dimensions.

source

Checker.check(
    ::UnitsChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if var and metadata have the same units, false otherwise.

source

Checker.check(
    ::DimValuesMatch,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if the values of the dimensions in var and metadata are compatible for the purpose of filling out the G ensemble matrix, false otherwise.

The nontemporal dimensions are compatible if the values are approximately the same. The temporal dimensions are compatible if the temporal dimension of metadata is a subset of the temporal dimension of var.

source

Checker.check(
    ::SequentialIndicesChecker,
    var::OutputVar,
    metadata::Metadata;
    data = nothing,
    verbose = false,
)

Return true if the dates of var map to sequential indices of the dates of metadata, false otherwise.

Use this check

It is recommended to always enable this check when possible.

Why use this check?

This check is helpful in ensuring that the dates are matched correctly between var and metadata. For example, without this check, if the simulation data contain monthly averages and metadata track seasonal averages, then no error is thrown, because all dates in metadata are in all the dates in var.

source

Checker.check(
    ::SignChecker,
    var::OutputVar,
    metadata::Metadata;
    data,
    verbose = false,
)

Return true if the absolute difference of the proportion of positive values in var.data and the proportion of positive values in data is less than the threshold defined in SignChecker, false otherwise.

source