CalibrateEmulateSample.jl
CalibrateEmulateSample.jl solves parameter estimation problems using accelerated (and approximate) Bayesian inversion. The framework can be applied currently to learn the joint distribution for flexible numbers of parameters, it is not inherently restricted to unimodal distributions. Please see the example below!.
It can be used with computer models that:
- can be noisy or chaotic,
- are non-differentiable,
- can only be treated as black-box (interfaced only with parameter files).
The computer model is supplied by the user, as a parameter-to-data map $\mathcal{G}(\theta): \mathbb{R}^p \rightarrow \mathbb{R}^d$. For example, $\mathcal{G}$ could be a map from any given parameter configuration $\theta$ to a collection of statistics of a dynamical system trajectory. $\mathcal{G}$ is referred to as the forward model in the Bayesian inverse problem setting.
The data produced by the forward model are compared to observations $y$, which are assumed to be corrupted by additive noise $\eta$, such that
\[y = \mathcal{G}(\theta) + \eta,\]
where the noise $\eta$ is drawn from a $d$-dimensional Gaussian with distribution $\mathcal{N}(0, \Gamma_y)$.
Quick links
- How do I build prior distributions?
- How do I build good observational noise covariances
- What ensemble size should I take? Which process should I use? What is the recommended configuration?
- Where can I walk through the simple example?
- What is the
EnsembleKalmanProcesses.jlpackage? - What are the recommendations/defaults for dimension reduction or data processing?
- How to I plot or interpret the posterior distribution?
The inverse problem
Given an observation $y$, the computer model $\mathcal{G}$, the observational noise $\Gamma_y$, and some broad prior information on $\theta$, we return the joint distribution of a data-informed distribution for "$\theta$ given $y$".
As the name suggests, CalibrateEmulateSample.jl breaks this problem into a sequence of three steps: calibration, emulation, and sampling. A comprehensive treatment of the calibrate-emulate-sample approach to Bayesian inverse problems can be found in Cleary et al. (2020).
The three steps of the algorithm: see our walkthrough of the Sinusoid Example
Learn the vertical shift and amplitude of the signal given the noisy observation

The calibrate step of the algorithm consists of an application of Ensemble Kalman Processes, which generates input-output pairs $\{\theta, \mathcal{G}(\theta)\}$ in high density around an optimal parameter $\theta^*$. Here, $\theta$ are amplitude and vertical shift pairs, and $\mathcal{G}(\theta)$ are the resulting signal mean and range. This $\theta^*$ will be near a mode of the posterior distribution (Note: This is the only time we interface with the forward model $\mathcal{G}$).
calibrate with EKP to generate data pairs...

The emulate step takes these pairs $\{\theta, \mathcal{G}(\theta)\}$ and trains a statistical surrogate model (e.g., a Gaussian process), emulating the forward map $\mathcal{G}$.
emulate the map statistically from EKP pairs...

The sample step uses this surrogate in place of $\mathcal{G}$ in a sampling method (Markov chain Monte Carlo) to sample the posterior distribution of $\theta$.
sample the emulated map with MCMC...

Code Components
CalibrateEmulateSample.jl contains the following modules:
| Module | Purpose |
|---|---|
| CalibrateEmulateSample.jl | A wrapper for the pipeline |
| Emulator.jl | Modular template for the emulators |
| GaussianProcess.jl | A Gaussian process emulator |
| Scalar/VectorRandomFeatureInterface.jl | A Scalar/Vector-output Random Feature emulator |
| MarkovChainMonteCarlo.jl | Modular template for Markov Chain Monte Carlo samplers |
| Utilities.jl | Helper functions |
Authors
CalibrateEmulateSample.jl is being developed by the Climate Modeling Alliance.