Diagnostics
This module defines and implements diagnostics of the training process. Diagnostics are organized into groups (defined as dictionaries io_dictionary_...
) and written to file in netCDF4 format. All implemented dictionaries, as well as all their entries, are defined below. These include error metrics, parameter ensembles and their statistical moments, validation diagnostics, and more.
The diagnostics are meant to evolve organically as users require more information regarding the training process. If you can think of diagnostics that may be useful to a wide range of users, consider opening an issue or pull request on GitHub.
Data within the resulting Diagnostics.nc
file can be analyzed using the Julia package NCDatasets.jl. The file can also be processed using software written in other popular programming languages like python through netCDF4-python or xarray.
CalibrateEDMF.Diagnostics.io_dictionary_metrics
— Functionio_dictionary_metrics()
io_dictionary_metrics(ekp::EnsembleKalmanProcess, mse_full::Vector{FT}) where {FT <: Real}
Scalar metrics dictionary.
Evaluations of the data-model mismatch in inverse problem (i.e., latent) space are denoted loss
. Errors computed in normalized physical (i.e., full) space are denoted mse_full
. Differences between these two metrics include:
- Covariance matrix defining the inner product (covariance weighting in
loss
vs L2 norm inmse_full
), - Treatment of trailing eigenvalues (truncation and regularization vs considering all eigenmodes).
- The
loss
includes the L2 penalty term,mse_full
does not.
Elements:
loss_mean_g
::(ḡ - y)'Γ_inv(ḡ - y)
. This is the ensemble mean loss seen by the Kalman inversion process.loss_mean
:: Ensemble mean of(g - y)'Γ_inv(g - y)
.loss_min
:: Ensemble min of(g - y)'Γ_inv(g - y)
.loss_max
:: Ensemble max of(g - y)'Γ_inv(g - y)
.loss_var
:: Variance estimate of(g - y)'Γ_inv(g - y)
, empirical (EKI/EKS) or quadrature (UKI).loss_nn_mean
::(g_nn - y)'Γ_inv(nn - y)
, whereg_nn
is the forward model output at the particle closest to the mean in parameter space.mse_full_mean
:: Ensemble mean of MSE(g_full
,y_full
).mse_full_min
:: Ensemble min of MSE(g_full
,y_full
).mse_full_max
:: Ensemble max of MSE(g_full
,y_full
).mse_full_var
:: Variance estimate of MSE(g_full
,y_full
), empirical (EKI/EKS) or quadrature (UKI).mse_full_nn_mean
:: MSE(g_full
,y_full
) of particle closest to the mean in parameter space. The mean in parameter space is the solution to the particle-based inversion.failures
:: Number of particle failures per iteration. If the calibration is run with the "high_loss" failure handler, this diagnostic will not capture the failures due to parameter mapping.timestep
:: EKP timestep in current iteration.nn_mean_index
:: Particle index of the nearest neighbor to the ensemble mean in parameter space. This index is used to construct..._nn_mean
metrics.
CalibrateEDMF.Diagnostics.io_dictionary_val_metrics
— Functionio_dictionary_val_metrics()
io_dictionary_val_metrics(ekp::EnsembleKalmanProcess, mse_full::Vector{FT}) where {FT <: Real}
Dictionary of scalar validation metrics.
Evaluations of the data-model mismatch in inverse problem (i.e., latent) space are denoted loss
. Errors computed in normalized physical (i.e., full) space are denoted mse_full
. Differences between these two metrics include:
- Covariance matrix defining the inner product (covariance weighting in
loss
vs L2 norm inmse_full
), - Treatment of trailing eigenvalues (truncation and regularization vs considering all eigenmodes).
- The
loss
includes the L2 penalty term,mse_full
does not.
Elements:
val_loss_mean
:: Ensemble mean of validation(g - y)'Γ_inv(g - y)
.val_loss_min
:: Ensemble min of validation(g - y)'Γ_inv(g - y)
.val_loss_max
:: Ensemble max of validation(g - y)'Γ_inv(g - y)
.val_loss_var
:: Variance estimate of validation(g - y)'Γ_inv(g - y)
, empirical (EKI/EKS) or quadrature (UKI).val_loss_nn_mean
:: Validation(g_nn - y)'Γ_inv(nn - y)
, whereg_nn
is the validation forward model output at the particle closest to the mean in parameter space.val_mse_full_mean
:: Ensemble mean of MSE(g_full_val
,y_full_val
).val_mse_full_min
:: Ensemble min of MSE(g_full_val
,y_full_val
).val_mse_full_max
:: Ensemble max of MSE(g_full_val
,y_full_val
).val_mse_full_var
:: Variance estimate of MSE(g_full_val
,y_full_val
), empirical (EKI/EKS) or quadrature (UKI).val_mse_full_nn_mean
:: MSE(g_full_val
,y_full_val
) of particle closest to the mean in parameter space. The mean in parameter space is the solution to the particle-based inversion.
CalibrateEDMF.Diagnostics.io_dictionary_ensemble
— Functionio_dictionary_ensemble()
io_dictionary_ensemble(ekp::EnsembleKalmanProcess, priors::ParameterDistribution)
Dictionary of ensemble parameter diagnostics.
Elements:
u_mean
:: Ensemble mean parameter in unconstrained (inverse problem) space.phi_mean
:: Ensemble mean parameter in constrained (physical) space.u_cov
:: Sample parameter covariance in unconstrained (inverse problem) space.phi_cov
:: Sample parameter covariance in constrained (physical) space.phi_low_unc
:: Lower uncertainty bound (μ-1σ) of the parameter value in constrained (physical) space.phi_upp_unc
:: Upper uncertainty bound (μ+1σ) of the parameter value in constrained (physical) space.
CalibrateEDMF.Diagnostics.io_dictionary_particle_state
— Functionio_dictionary_particle_state()
io_dictionary_particle_state(ekp::EnsembleKalmanProcess, priors::ParameterDistribution)
Dictionary of particle-wise parameter diagnostics, not involving forward model evaluations.
Elements:
u
:: Parameter ensemble in unconstrained (inverse problem) space.phi
:: Parameter ensemble in constrained (physical) space.
CalibrateEDMF.Diagnostics.io_dictionary_particle_eval
— Functionio_dictionary_particle_eval()
io_dictionary_particle_eval(
ekp::EnsembleKalmanProcess,
g_full::Matrix{FT},
mse_full::Vector{FT},
d::IT,
d_full::IT,
batch_indices::Vector{IT},
) where {FT <: Real, IT <: Integer}
Dictionary of particle-wise diagnostics involving forward model evaluations.
Elements:
g
:: Forward model evaluation in inverse problem space.g_full
:: Forward model evaluation in primitive output space, normalized using the pooled field covariance.mse_full
:: Particle-wise evaluation of MSE(g_full
,y_full
).batch_indices
:: Indices ofReferenceModel
s evaluated per iteration.
CalibrateEDMF.Diagnostics.io_dictionary_val_particle_eval
— Functionio_dictionary_val_particle_eval()
io_dictionary_val_particle_eval(
g::Matrix{FT},
g_full::Matrix{FT},
mse_full::Vector{FT},
d::IT,
d_full::IT,
batch_indices::Vector{IT},
) where {FT <: Real, IT <: Integer}
Dictionary of particle-wise validation diagnostics involving forward model evaluations.
Elements:
val_g
:: Validation forward model evaluation in reduced space.val_g_full
:: Validation forward model evaluation in primitive output space, normalized using the pooled field covariance.val_mse_full
:: Particle-wise evaluation of MSE(val_g_full
,val_y_full
).val_batch_indices
:: Indices of validationReferenceModel
s evaluated per iteration.
CalibrateEDMF.Diagnostics.io_dictionary_reference
— Functionio_dictionary_reference()
io_dictionary_reference(
ref_stats::ReferenceStatistics,
ref_models::Vector{ReferenceModel},
write_full_stats::Bool = true,
)
Dictionary of diagnostics pertaining to the ReferenceModel
s and ReferenceStatistics
that define the inverse problem.
Elements:
Gamma
:: Covariance matrix in the inverse problem latent space (regularized low-dimensional encoding).Gamma_full
:: Covariance matrix of normalized observed variables in full space (possibly ill-conditioned). Only written to file ifwrite_full_stats
is true.Gamma_full_diag
:: Diagonal ofGamma_full
, useful whenGamma_full
is not written to file.y
:: Observations in the inverse problem latent space (low-dimensional encoding).y_full
:: Normalized observations in full space.P_pca
:: PCA projection matrix from full space to low-dimensional latent space.num_vars
:: Maximum number of observed fields (not dimensions) perReferenceModel
.var_dof
:: Maximum number of degrees of freedom of each field perReferenceModel
.config_pca_dim
:: Dimensionality of the latent space associated with eachReferenceModel
.config_name
:: Name of eachReferenceModel
used to construct the inverse problem.config_z_obs
:: Vertical locations of the observations of eachReferenceModel
.norm_factor
:: Pooled variance used to normalize each field of eachReferenceModel
.
CalibrateEDMF.Diagnostics.io_dictionary_val_reference
— Functionio_dictionary_val_reference()
io_dictionary_val_reference(
ref_stats::ReferenceStatistics,
ref_models::Vector{ReferenceModel},
write_full_stats::Bool = true,
)
Dictionary of diagnostics pertaining to the ReferenceModel
s and ReferenceStatistics
in the validation set.
Elements:
Gamma_val
:: Covariance matrix in latent space, using the same truncation as for the training set.Gamma_full_val
:: Covariance matrix of normalized observed variables in full space. Only written to file ifwrite_full_stats
is true.Gamma_full_diag_val
:: Diagonal ofGamma_full_val
, useful whenGamma_full_val
is not written to file.y_val
:: Observations in latent space, for observed fields in the validation set.y_full_val
:: Normalized observations in full space, for the validation set.P_pca_val
:: PCA projection matrix from full space to low-dimensional latent space, for the validation set.num_vars_val
:: Maximum number of observed fields (not dimensions) per validationReferenceModel
.var_dof_val
:: Maximum number of degrees of freedom of each field per validationReferenceModel
.config_pca_dim_val
:: Dimensionality of the latent space associated with each validationReferenceModel
.config_name_val
:: Name of eachReferenceModel
in the validation set.config_z_obs_val
:: Vertical locations of the observations of each validationReferenceModel
.norm_factor_val
:: Pooled variance used to normalize each field of each validationReferenceModel
.
CalibrateEDMF.Diagnostics.io_dictionary_prior
— Functionio_dictionary_prior()
io_dictionary_prior(priors::ParameterDistribution)
Parameter prior diagnostics dictionary.
Elements:
u_mean_prior
:: Prior mean in unconstrained parameter space.phi_mean_prior
:: Prior mean in constrained parameter space.u_var_prior
:: Diagonal of the prior covariance in unconstrained space.phi_low_unc_prior
:: Lower uncertainty bound (μ-1σ_prior) of prior in constrained space.phi_upp_unc_prior
:: Upper uncertainty bound (μ+1σ_prior) of prior in constrained space.phi_low_std_prior
:: Lower standard bound (μ-1) of prior in constrained space. Useful measure of minimum allowed values for bounded parameters.phi_upp_std_prior
:: Upper standard bound (μ+1) of prior in constrained space. Useful measure of maximum allowed values for bounded parameters.
CalibrateEDMF.Diagnostics.get_ϕ_cov
— Functionget_ϕ_cov(ekp::EnsembleKalmanProcess, priors::ParameterDistribution)
Get the last parameter covariance estimate in constrained (physical) space.
For ensemble methods, the covariance of the transformed parameters is returned. For unscented methods, the covariance is computed through a quadrature on the transformed quadrature points. The covariance of the transformed parameters returned here is equal to the transformed covariance only under a first order Taylor approximation, which is consistent with other approximations underlying the calibration method.
Inputs:
ekp
:: The EnsembleKalmanProcess.priors
:: The priors defining transformations between constrained and unconstrained space.
Outputs:
- The parameter covariance in constrained space.
CalibrateEDMF.Diagnostics.get_metric_var
— Functionget_metric_var(ekp::EnsembleKalmanProcess, metric::Vector{FT}) where {FT <: Real}
Computes the ensemble variance of a scalar metric.
For ensemble methods, the sample variance of the metric is returned. For unscented methods, the variance is computed through a quadrature. Ensemble members where the metric is NaN
are filtered out of the computation.
Inputs:
ekp
:: The EnsembleKalmanProcess.metric
:: A vector containing the value of the metric for each ensemble member.
Outputs:
- The ensemble variance of
metric
.
CalibrateEDMF.Diagnostics.get_mean_nearest_neighbor
— Functionget_mean_nearest_neighbor(ekp::EnsembleKalmanProcess)
Returns the index of the nearest neighbor to the ensemble mean parameter, in unconstrained space.
CalibrateEDMF.Diagnostics.compute_ensemble_loss
— Functioncompute_ensemble_loss(
g::AbstractMatrix{FT},
y::AbstractVector{FT},
Γ::Union{AbstractMatrix{FT}, UniformScaling{FT}},
) where {FT <: Real}
compute_ensemble_loss(ekp::EnsembleKalmanProcess)
Computes the covariance-weighted error (g - y)'Γ_inv(g - y)
for each ensemble member.