Global Sensitiviy Analysis for an emulated Ishigami function
The full code is found in the examples/Emulator/
directory of the github repository
In this example, we assess directly the performance of our machine learning emulators. The task is to learn a function for use in a global sensitivity analysis. In particular, we learn the Ishigami function
\[f(x; a, b) = (1 + bx_3^4)\sin(x_1) + a \sin(x_2), \forall x\in [-\pi,\pi]^3\]
with $a=7, b=0.1$. In this example, global sensitivity analysis refers to calculation of two Sobol indices. The first index collects proportions $V_i$ (a.k.a firstorder
) of the variance of $f$ attributable to the input $x_i$, and the second index collects proportions $TV_i$ (a.k.a totalorder
) of the residual variance having removed contributions attributable to inputs $x_j$ $\forall j\neq i$. The Ishigami function has an analytic formula for these Sobol indices, it is also known that one can obtain numerical approximation through quasi-Monto-Carlo methods by evaluating the Ishigami function on a special quasi-random Sobol sequence.
To emulate the Ishigami function, the data consists of 300 pairs $\{x,f(x)+\eta\}$ where $\eta \sim N(0,\Sigma)$ is additive noise, and the x are sampled from the Sobol sequence. The emulators are validated by evaluating the posterior mean function on the full 16000 points of the Sobol sequence and the Sobol indices are estimated. We rerun the experiment for many repeats of the random feature hyperparameter optimization and present the statistics of these indices, as well as plotting a realization of the emulator.
We use the package GlobalSensitivityAnalysis.jl for many of the GSA tools.
Walkthrough of the code
We first load some standard packages
using Distributions
using DataStructures # for `OrderedDict`
using Random
using LinearAlgebra
using CairoMakie, ColorSchemes
and the packages for providing the Ishigami function, Sobol sequence, and evaluation of the indices
using GlobalSensitivityAnalysis # for `SobolData`
const GSA = GlobalSensitivityAnalysis
then the CES packages for the emulators
using CalibrateEmulateSample.Emulators # for `SKLJL`, `GaussianProcess`, `SeparableKernel`, `LowRankFactor`, `OneDimFactor`, `ScalarRandomFeatureInterface`, `Emulator`
using CalibrateEmulateSample.DataContainers # for `PairedDataContainer`
using CalibrateEmulateSample.EnsembleKalmanProcesses # for `DataMisfitController`
We set up the sampling procedure, evaluate the true ishigami function on these points, and calculate the sobol indices
n_data_gen = 2000
data = SobolData(
params = OrderedDict(:x1 => Uniform(-π, π), :x2 => Uniform(-π, π), :x3 => Uniform(-π, π)),
N = n_data_gen,
)
# To perform global analysis,
# one must generate samples using Sobol sequence (i.e. creates more than N points)
samples = GSA.sample(data)
n_data = size(samples, 1) # [n_samples x 3]
# run model (example)
y = GSA.ishigami(samples)
# perform Sobol Analysis
result = analyze(data, y)
Next we create the noisy training data from the sequence samples
n_train_pts = 300
ind = shuffle!(rng, Vector(1:n_data))[1:n_train_pts]
# now subsample the samples data
n_tp = length(ind)
input = zeros(3, n_tp)
output = zeros(1, n_tp)
Γ = 1e-2
noise = rand(rng, Normal(0, Γ), n_tp)
for i in 1:n_tp
input[:, i] = samples[ind[i], :]
output[i] = y[ind[i]] + noise[i]
end
iopairs = PairedDataContainer(input, output)
We have a few cases for the user to investigate
cases = [
"Prior", # Scalar random feature emulator with no hyperparameter learning
"GP", # Trained Sci-kit learn Gaussian process emulator
"RF-scalar", # Trained scalar random feature emulator
]
Each case sets up a different machine learning configuration in the Emulator
object.
For the random feature case, RF-scalar
, we use a rank-3 kernel in the input space, and 500 features for prediction, though for efficiency we use only 200 when learning the hyperparameters. The relevant code snippets are
nugget = Float64(1e-12)
overrides = Dict(
"scheduler" => DataMisfitController(terminate_at = 1e4),
"n_features_opt" => 200,
)
kernel_structure = SeparableKernel(LowRankFactor(3, nugget), OneDimFactor())
n_features = 500
mlt = ScalarRandomFeatureInterface(
n_features,
3,
rng = rng,
kernel_structure = kernel_structure,
optimizer_options = overrides,
)
For the gaussian process case GP
we use the sci-kit learn package, a default squared exponential kernel with lengthscale learnt in each input dimensions. We do not learn an additional white kernel for the noise.
gppackage = Emulators.SKLJL()
pred_type = Emulators.YType()
mlt = GaussianProcess(
gppackage;
prediction_type = pred_type,
noise_learn = false,
)
We finish by building the emulator object
emulator = Emulator(mlt, iopairs; obs_noise_cov = Γ * I, decorrelate = decorrelate)
optimize_hyperparameters!(emulator) # (only needed for some Emulator packages)
Results and plots
We validate the emulator by evaluating it on the entire Sobol sequence, and calculating the Sobol indices (presenting mean and std if using multiple repeats.
# predict on all Sobol points with emulator (example)
y_pred, y_var = predict(emulator, samples', transform_to_real = true)
# obtain emulated Sobol indices
result_pred = analyze(data, y_pred')
Gaussian Process Emulator (sci-kit learn GP
)
Here is the plot for one emulation
and the outputted table of Sobol indices
True Sobol Indices
******************
firstorder: [0.31390519114781146, 0.44241114479004084, 0.0]
totalorder: [0.5575888552099592, 0.44241114479004084, 0.24368366406214775]
Sampled truth Sobol Indices (# points 16000)
***************************
firstorder: [0.31261591941512257, 0.441887746224135, -0.005810964687365922]
totalorder: [0.5623611180844434, 0.44201284296404386, 0.24465876318633062]
Sampled Emulated Sobol Indices (# obs 300, noise var 0.01)
***************************************************************
firstorder: [0.3094638183079643, 0.4518400892052567, -0.007351344957260407]
totalorder: [0.5502469909342245, 0.4587734278791574, 0.23542404141319245]
Random feature emulator (Separable Low-rank kernel RF-scalar
)
Here is the plot for one emulation
Table for 20 repeats of the algorithm
True Sobol Indices
******************
firstorder: [0.31390519114781146, 0.44241114479004084, 0.0]
totalorder: [0.5575888552099592, 0.44241114479004084, 0.24368366406214775]
Sampled truth Sobol Indices (# points 16000)
***************************
firstorder: [0.31261591941512257, 0.441887746224135, -0.005810964687365922]
totalorder: [0.5623611180844434, 0.44201284296404386, 0.24465876318633062]
Sampled Emulated Sobol Indices (# obs 300, noise var 0.01)
***************************************************************
(mean) firstorder: [0.33605548545490044, 0.41116050093679196, -0.0012213648484969539]
(std) firstorder: [0.05909336956162543, 0.11484966121124164, 0.012908533302492602]
(mean) totalorder: [0.5670345355855254, 0.4716028261179354, 0.24108222433317]
(std) totalorder: [0.10619345801872732, 0.1041023777237331, 0.07200225781785778]