Integrating Lorenz 63 with an emulated integrator

How do I run this code?

The full code is found in the examples/Emulator/ directory of the github repository

In this example, we assess directly the performance of our machine learning emulators. The task is to learn the forward Euler integrator of a Lorenz 63 system. The model parameters are set to their classical settings $(\sigma, \rho, \beta) = (10,28,\frac{8}{3})$ to exhibit chaotic behavior. The discrete system is given as:

\[\begin{aligned} x(t_{n+1}) &= x(t_n) + \Delta t(y(t_n) - x(t_n))\\ y(t_{n+1}) &= y(t_n) + \Delta t(x(t_n)(28 - z(t_n)) - y(t_n))\\ z(t_{n+1}) &= z(t_n) + \Delta t(x(t_n)y(t_n) - \frac{8}{3}z(t_n)) \end{aligned}\]

where $t_n = n\Delta t$. The data consists of pairs $\{ (x(t_k),y(t_k),z(t_k)), (x(t_{k+1}),y(t_{k+1}),z(t_{k+1})+\eta_k\}$ for 600 values of $k$, with each output subjected to independent, additive Gaussian noise $\eta_k\sim N(0,\Gamma_y)$.

We have several different emulator configurations in this example that the user can play around with. The goal of the emulator is that the posterior mean will approximate this discrete update map, or integrator, for any point on the Lorenz attractor from the sparse noisy data. To validate this, we recursively apply the trained emulator to the state, plotting the evolution of the trajectory and marginal statistics of the states over short and long timescales. We include a repeats option (n_repeats) to run the randomized training for multiple trials and illustrate robustness of marginal statistics by plotting long time marginal cdfs of the state.

We will term scalar-output Gaussin process emulator as "GP", and scalar- or vector-output random feature emulator as "RF".

Walkthrough of the code

We first import some standard packages

using Random, Distributions, LinearAlgebra # utilities
using CairoMakie, ColorSchemes # for plots
using JLD2 # for saved data

For the true integrator of the Lorenz system we import

using OrdinaryDiffEq 

For relevant CES packages needed to define the emulators, packages and kernel structures

using CalibrateEmulateSample.Emulators
# Contains `Emulator`, `GaussianProcess`, `ScalarRandomFeatureInterface`, `VectorRandomFeatureInterface`
# `GPJL`, `SeparablKernel`, `NonSeparableKernel`, `OneDimFactor`, `LowRankFactor`, `DiagonalFactor`
using CalibrateEmulateSample.DataContainers # Contains `PairedDataContainer`

and if one wants to play with optimizer options for the random feature emulators we import

using CalibrateEmulateSample.EnsembleKalmanProcesses # Contains `DataMisfitController`

We generate the truth data using OrdinaryDiffEq: the time series sol is used for training data, sol_test is used for plotting short time trajectories, and sol_hist for plotting histograms of the state over long times:

# Run L63 from 0 -> tmax
u0 = [1.0; 0.0; 0.0]
tmax = 20
dt = 0.01
tspan = (0.0, tmax)
prob = ODEProblem(lorenz, u0, tspan)
sol = solve(prob, Euler(), dt = dt)

# Run L63 from end for test trajectory data
tmax_test = 100
tspan_test = (0.0, tmax_test)
u0_test = sol.u[end]
prob_test = ODEProblem(lorenz, u0_test, tspan_test)
sol_test = solve(prob_test, Euler(), dt = dt)

# Run L63 from end for histogram matching data
tmax_hist = 1000
tspan_hist = (0.0, tmax_hist)
u0_hist = sol_test.u[end]
prob_hist = ODEProblem(lorenz, u0_hist, tspan_hist)
sol_hist = solve(prob_hist, Euler(), dt = dt)

We generate the training data from sol within [tburn, tmax]. The user has the option of how many training points to take n_train_pts and whether these are selected randomly or sequentially (sample_rand). The true outputs are perturbed by noise of variance 1e-4 and pairs are stored in the compatible data format PairedDataContainer

tburn = 1 # NB works better with no spin-up!
burnin = Int(floor(tburn / dt))
n_train_pts = 600 
sample_rand = true
if sample_rand
   ind = Int.(shuffle!(rng, Vector(burnin:(tmax / dt - 1)))[1:n_train_pts])
else
   ind = burnin:(n_train_pts + burnin)
end
n_tp = length(ind)
input = zeros(3, n_tp)
output = zeros(3, n_tp)
Γy = 1e-4 * I(3)
noise = rand(rng, MvNormal(zeros(3), Γy), n_tp)
for i in 1:n_tp
    input[:, i] = sol.u[ind[i]]
    output[:, i] = sol.u[ind[i] + 1] + noise[:, i]
end
iopairs = PairedDataContainer(input, output)

We have several cases the user can play with,

cases = ["GP", "RF-scalar", "RF-scalar-diagin", "RF-svd-nonsep", "RF-nosvd-nonsep", "RF-nosvd-sep"]

Then, looping over the repeats, we first define some common hyperparamter optimization options for the "RF-" cases. In this case, the options are used primarily for diagnostics and acceleration (not required in general to solve this problem)

rf_optimizer_overrides = Dict(
    "verbose" => true, # output diagnostics from the hyperparameter optimizer
    "scheduler" => DataMisfitController(terminate_at = 1e4), # timestepping method for the optimizer
    "cov_sample_multiplier" => 0.5, # 0.5*default number of samples to estimate covariances in optimizer
    "n_features_opt" => 200, # number of features during hyperparameter optimization
    "n_iteration" => 20, # iterations of the optimizer solver
)

Then we build the machine learning tools. Here we highlight scalar-output Gaussian process (GP), where we use the default squared-exponential kernel, and learn a lengthscale hyperparameter in each input dimension. To handle multiple outputs, we will use a decorrelation in the output space, and so will actually train three of these models.

gppackage = Emulators.GPJL() # use GaussianProcesses.jl
pred_type = Emulators.YType() # predicted variances are for data not latent function
mlt = GaussianProcess(
    gppackage;
    prediction_type = pred_type,
    noise_learn = false, # do not additionally learn a white kernel
)

we also highlight the Vector Random Feature with nonseparable kernel (RF-nosvd-nonsep), this can natively handle multiple outputs without decorrelation of the output space. This kernel is a rank-3 representation with small nugget term.

nugget = 1e-12
kernel_structure = NonseparableKernel(LowRankFactor(3, nugget))
n_features = 500 # number of features for prediction
mlt = VectorRandomFeatureInterface(
    n_features,
    3, # input dimension
    3, # output dimension
    rng = rng, # pin random number generator
    kernel_structure = kernel_structure,
    optimizer_options = rf_optimizer_overrides, 
)           

With machine learning tools specified, we build the emulator object

# decorrelate = true for `GP`
# decorrelate = false for `RF-nosvd-nonsep`
emulator = Emulator(mlt, iopairs; obs_noise_cov = Γy, decorrelate = decorrelate) 
optimize_hyperparameters!(emulator) # some GP packages require this additional call 

Plots

We predict the trained emulator mean, over the short-timescale validation trajectory

u_test_tmp = zeros(3, length(xspan_test))
u_test_tmp[:, 1] = sol_test.u[1] # initialize at the final-time solution of the training period

for i in 1:(length(xspan_test) - 1)
    rf_mean, _ = predict(emulator, u_test_tmp[:, i:i], transform_to_real = true) # 3x1 matrix
    u_test_tmp[:, i + 1] = rf_mean
end

The other trajectories are similar. We then produce the following plots. In all figures, the results from evolving the state with the true integrator is orange, and with the emulated integrators are blue.

Gaussian Process Emulator (Sci-kit learn: GP)

For one example fit

Random Feature Emulator (RF-nosvd-nonsep)

For one example fit

and here are CDFs over 20 randomized trials of the random feature hyperparameter optimization