Diagnostics

This module defines and implements diagnostics of the training process. Diagnostics are organized into groups (defined as dictionaries io_dictionary_...) and written to file in netCDF4 format. All implemented dictionaries, as well as all their entries, are defined below. These include error metrics, parameter ensembles and their statistical moments, validation diagnostics, and more.

The diagnostics are meant to evolve organically as users require more information regarding the training process. If you can think of diagnostics that may be useful to a wide range of users, consider opening an issue or pull request on GitHub.

Data within the resulting Diagnostics.nc file can be analyzed using the Julia package NCDatasets.jl. The file can also be processed using software written in other popular programming languages like python through netCDF4-python or xarray.

CalibrateEDMF.Diagnostics.io_dictionary_metricsFunction
io_dictionary_metrics()
io_dictionary_metrics(ekp::EnsembleKalmanProcess, mse_full::Vector{FT}) where {FT <: Real}

Scalar metrics dictionary.

Evaluations of the data-model mismatch in inverse problem (i.e., latent) space are denoted loss. Errors computed in normalized physical (i.e., full) space are denoted mse_full. Differences between these two metrics include:

  • Covariance matrix defining the inner product (covariance weighting in loss vs L2 norm in mse_full),
  • Treatment of trailing eigenvalues (truncation and regularization vs considering all eigenmodes).
  • The loss includes the L2 penalty term, mse_full does not.

Elements:

  • loss_mean_g :: (ḡ - y)'Γ_inv(ḡ - y). This is the ensemble mean loss seen by the Kalman inversion process.
  • loss_mean :: Ensemble mean of (g - y)'Γ_inv(g - y).
  • loss_min :: Ensemble min of (g - y)'Γ_inv(g - y).
  • loss_max :: Ensemble max of (g - y)'Γ_inv(g - y).
  • loss_var :: Variance estimate of (g - y)'Γ_inv(g - y), empirical (EKI/EKS) or quadrature (UKI).
  • loss_nn_mean :: (g_nn - y)'Γ_inv(nn - y), where g_nn is the forward model output at the particle closest to the mean in parameter space.
  • mse_full_mean :: Ensemble mean of MSE(g_full, y_full).
  • mse_full_min :: Ensemble min of MSE(g_full, y_full).
  • mse_full_max :: Ensemble max of MSE(g_full, y_full).
  • mse_full_var :: Variance estimate of MSE(g_full, y_full), empirical (EKI/EKS) or quadrature (UKI).
  • mse_full_nn_mean :: MSE(g_full, y_full) of particle closest to the mean in parameter space. The mean in parameter space is the solution to the particle-based inversion.
  • failures :: Number of particle failures per iteration. If the calibration is run with the "high_loss" failure handler, this diagnostic will not capture the failures due to parameter mapping.
  • timestep :: EKP timestep in current iteration.
  • nn_mean_index :: Particle index of the nearest neighbor to the ensemble mean in parameter space. This index is used to construct ..._nn_mean metrics.
source
CalibrateEDMF.Diagnostics.io_dictionary_val_metricsFunction
io_dictionary_val_metrics()
io_dictionary_val_metrics(ekp::EnsembleKalmanProcess, mse_full::Vector{FT}) where {FT <: Real}

Dictionary of scalar validation metrics.

Evaluations of the data-model mismatch in inverse problem (i.e., latent) space are denoted loss. Errors computed in normalized physical (i.e., full) space are denoted mse_full. Differences between these two metrics include:

  • Covariance matrix defining the inner product (covariance weighting in loss vs L2 norm in mse_full),
  • Treatment of trailing eigenvalues (truncation and regularization vs considering all eigenmodes).
  • The loss includes the L2 penalty term, mse_full does not.

Elements:

  • val_loss_mean :: Ensemble mean of validation (g - y)'Γ_inv(g - y).
  • val_loss_min :: Ensemble min of validation (g - y)'Γ_inv(g - y).
  • val_loss_max :: Ensemble max of validation (g - y)'Γ_inv(g - y).
  • val_loss_var :: Variance estimate of validation (g - y)'Γ_inv(g - y), empirical (EKI/EKS) or quadrature (UKI).
  • val_loss_nn_mean :: Validation (g_nn - y)'Γ_inv(nn - y), where g_nn is the validation forward model output at the particle closest to the mean in parameter space.
  • val_mse_full_mean :: Ensemble mean of MSE(g_full_val, y_full_val).
  • val_mse_full_min :: Ensemble min of MSE(g_full_val, y_full_val).
  • val_mse_full_max :: Ensemble max of MSE(g_full_val, y_full_val).
  • val_mse_full_var :: Variance estimate of MSE(g_full_val, y_full_val), empirical (EKI/EKS) or quadrature (UKI).
  • val_mse_full_nn_mean :: MSE(g_full_val, y_full_val) of particle closest to the mean in parameter space. The mean in parameter space is the solution to the particle-based inversion.
source
CalibrateEDMF.Diagnostics.io_dictionary_ensembleFunction
io_dictionary_ensemble()
io_dictionary_ensemble(ekp::EnsembleKalmanProcess, priors::ParameterDistribution)

Dictionary of ensemble parameter diagnostics.

Elements:

  • u_mean :: Ensemble mean parameter in unconstrained (inverse problem) space.
  • phi_mean :: Ensemble mean parameter in constrained (physical) space.
  • u_cov :: Sample parameter covariance in unconstrained (inverse problem) space.
  • phi_cov :: Sample parameter covariance in constrained (physical) space.
  • phi_low_unc :: Lower uncertainty bound (μ-1σ) of the parameter value in constrained (physical) space.
  • phi_upp_unc :: Upper uncertainty bound (μ+1σ) of the parameter value in constrained (physical) space.
source
CalibrateEDMF.Diagnostics.io_dictionary_particle_stateFunction
io_dictionary_particle_state()
io_dictionary_particle_state(ekp::EnsembleKalmanProcess, priors::ParameterDistribution)

Dictionary of particle-wise parameter diagnostics, not involving forward model evaluations.

Elements:

  • u :: Parameter ensemble in unconstrained (inverse problem) space.
  • phi :: Parameter ensemble in constrained (physical) space.
source
CalibrateEDMF.Diagnostics.io_dictionary_particle_evalFunction
io_dictionary_particle_eval()
io_dictionary_particle_eval(
    ekp::EnsembleKalmanProcess,
    g_full::Matrix{FT},
    mse_full::Vector{FT},
    d::IT,
    d_full::IT,
    batch_indices::Vector{IT},
) where {FT <: Real, IT <: Integer}

Dictionary of particle-wise diagnostics involving forward model evaluations.

Elements:

  • g :: Forward model evaluation in inverse problem space.
  • g_full :: Forward model evaluation in primitive output space, normalized using the pooled field covariance.
  • mse_full :: Particle-wise evaluation of MSE(g_full, y_full).
  • batch_indices :: Indices of ReferenceModels evaluated per iteration.
source
CalibrateEDMF.Diagnostics.io_dictionary_val_particle_evalFunction
io_dictionary_val_particle_eval()
io_dictionary_val_particle_eval(
    g::Matrix{FT},
    g_full::Matrix{FT},
    mse_full::Vector{FT},
    d::IT,
    d_full::IT,
    batch_indices::Vector{IT},
) where {FT <: Real, IT <: Integer}

Dictionary of particle-wise validation diagnostics involving forward model evaluations.

Elements:

  • val_g :: Validation forward model evaluation in reduced space.
  • val_g_full :: Validation forward model evaluation in primitive output space, normalized using the pooled field covariance.
  • val_mse_full :: Particle-wise evaluation of MSE(val_g_full, val_y_full).
  • val_batch_indices :: Indices of validation ReferenceModels evaluated per iteration.
source
CalibrateEDMF.Diagnostics.io_dictionary_referenceFunction
io_dictionary_reference()
io_dictionary_reference(
    ref_stats::ReferenceStatistics,
    ref_models::Vector{ReferenceModel},
    write_full_stats::Bool = true,
)

Dictionary of diagnostics pertaining to the ReferenceModels and ReferenceStatistics that define the inverse problem.

Elements:

  • Gamma :: Covariance matrix in the inverse problem latent space (regularized low-dimensional encoding).
  • Gamma_full :: Covariance matrix of normalized observed variables in full space (possibly ill-conditioned). Only written to file if write_full_stats is true.
  • Gamma_full_diag :: Diagonal of Gamma_full, useful when Gamma_full is not written to file.
  • y :: Observations in the inverse problem latent space (low-dimensional encoding).
  • y_full :: Normalized observations in full space.
  • P_pca :: PCA projection matrix from full space to low-dimensional latent space.
  • num_vars :: Maximum number of observed fields (not dimensions) per ReferenceModel.
  • var_dof :: Maximum number of degrees of freedom of each field per ReferenceModel.
  • config_pca_dim :: Dimensionality of the latent space associated with each ReferenceModel.
  • config_name :: Name of each ReferenceModel used to construct the inverse problem.
  • config_z_obs :: Vertical locations of the observations of each ReferenceModel.
  • norm_factor :: Pooled variance used to normalize each field of each ReferenceModel.
source
CalibrateEDMF.Diagnostics.io_dictionary_val_referenceFunction
io_dictionary_val_reference()
io_dictionary_val_reference(
    ref_stats::ReferenceStatistics,
    ref_models::Vector{ReferenceModel},
    write_full_stats::Bool = true,
)

Dictionary of diagnostics pertaining to the ReferenceModels and ReferenceStatistics in the validation set.

Elements:

  • Gamma_val :: Covariance matrix in latent space, using the same truncation as for the training set.
  • Gamma_full_val :: Covariance matrix of normalized observed variables in full space. Only written to file if write_full_stats is true.
  • Gamma_full_diag_val :: Diagonal of Gamma_full_val, useful when Gamma_full_val is not written to file.
  • y_val :: Observations in latent space, for observed fields in the validation set.
  • y_full_val :: Normalized observations in full space, for the validation set.
  • P_pca_val :: PCA projection matrix from full space to low-dimensional latent space, for the validation set.
  • num_vars_val :: Maximum number of observed fields (not dimensions) per validation ReferenceModel.
  • var_dof_val :: Maximum number of degrees of freedom of each field per validation ReferenceModel.
  • config_pca_dim_val :: Dimensionality of the latent space associated with each validation ReferenceModel.
  • config_name_val :: Name of each ReferenceModel in the validation set.
  • config_z_obs_val :: Vertical locations of the observations of each validation ReferenceModel.
  • norm_factor_val :: Pooled variance used to normalize each field of each validation ReferenceModel.
source
CalibrateEDMF.Diagnostics.io_dictionary_priorFunction
io_dictionary_prior()
io_dictionary_prior(priors::ParameterDistribution)

Parameter prior diagnostics dictionary.

Elements:

  • u_mean_prior :: Prior mean in unconstrained parameter space.
  • phi_mean_prior :: Prior mean in constrained parameter space.
  • u_var_prior :: Diagonal of the prior covariance in unconstrained space.
  • phi_low_unc_prior :: Lower uncertainty bound (μ-1σ_prior) of prior in constrained space.
  • phi_upp_unc_prior :: Upper uncertainty bound (μ+1σ_prior) of prior in constrained space.
  • phi_low_std_prior :: Lower standard bound (μ-1) of prior in constrained space. Useful measure of minimum allowed values for bounded parameters.
  • phi_upp_std_prior :: Upper standard bound (μ+1) of prior in constrained space. Useful measure of maximum allowed values for bounded parameters.
source
CalibrateEDMF.Diagnostics.get_ϕ_covFunction
get_ϕ_cov(ekp::EnsembleKalmanProcess, priors::ParameterDistribution)

Get the last parameter covariance estimate in constrained (physical) space.

For ensemble methods, the covariance of the transformed parameters is returned. For unscented methods, the covariance is computed through a quadrature on the transformed quadrature points. The covariance of the transformed parameters returned here is equal to the transformed covariance only under a first order Taylor approximation, which is consistent with other approximations underlying the calibration method.

Inputs:

  • ekp :: The EnsembleKalmanProcess.
  • priors :: The priors defining transformations between constrained and unconstrained space.

Outputs:

  • The parameter covariance in constrained space.
source
CalibrateEDMF.Diagnostics.get_metric_varFunction
get_metric_var(ekp::EnsembleKalmanProcess, metric::Vector{FT}) where {FT <: Real}

Computes the ensemble variance of a scalar metric.

For ensemble methods, the sample variance of the metric is returned. For unscented methods, the variance is computed through a quadrature. Ensemble members where the metric is NaN are filtered out of the computation.

Inputs:

  • ekp :: The EnsembleKalmanProcess.
  • metric :: A vector containing the value of the metric for each ensemble member.

Outputs:

  • The ensemble variance of metric.
source
CalibrateEDMF.Diagnostics.compute_ensemble_lossFunction
compute_ensemble_loss(
    g::AbstractMatrix{FT},
    y::AbstractVector{FT},
    Γ::Union{AbstractMatrix{FT}, UniformScaling{FT}},
) where {FT <: Real}
compute_ensemble_loss(ekp::EnsembleKalmanProcess)

Computes the covariance-weighted error (g - y)'Γ_inv(g - y) for each ensemble member.

source