Model Site-Level Calibration Tutorial Using Observations

In this tutorial we will calibrate the Vcmax25 and g1 parameters using latent heat flux observations from the FLUXNET site (US-MOz).

Overview

The tutorial covers:

Setting up a land surface model for a FLUXNET site (US-MOz)
Obtaining the observation dataset
Implementing Ensemble Kalman Inversion to calibrate Vcmax25 and g1
Analyzing the calibration results

Setup and Imports

Load all the necessary packages for land surface modeling, diagnostics, plotting, and ensemble methods:

using ClimaLand
using ClimaLand.Domains: Column
using ClimaLand.Canopy
using ClimaLand.Simulations
import ClimaLand.FluxnetSimulations as FluxnetSimulations
import ClimaLand.Parameters as LP
import ClimaLand.LandSimVis as LandSimVis
import ClimaDiagnostics
import ClimaUtilities.TimeManager: date
import EnsembleKalmanProcesses as EKP
import EnsembleKalmanProcesses.ParameterDistributions as PD
using CairoMakie
CairoMakie.activate!()
using Statistics
using Logging
import Random
using Dates
using ClimaAnalysis, GeoMakie, Printf, StatsBase

Configuration and Site Setup

Configure the experiment parameters and set up the FLUXNET site (US-MOz) with its specific location, time settings, and atmospheric conditions.

Set random seed for reproducibility and floating point precision

rng_seed = 1234
rng = Random.MersenneTwister(rng_seed)
const FT = Float32

Float32

Initialize land parameters and site configuration.

toml_dict = LP.create_toml_dict(FT)
site_ID = "US-MOz"
site_ID_val = FluxnetSimulations.replace_hyphen(site_ID);

Get site-specific information: location coordinates, time offset, and sensor height.

(; time_offset, lat, long) =
    FluxnetSimulations.get_location(FT, Val(site_ID_val))
(; atmos_h) = FluxnetSimulations.get_fluxtower_height(FT, Val(site_ID_val));

Get simulation start and stop dates in UTC; these must be included in the forcing data range Here we calibrate with only two months of data.

start_date = DateTime(2010, 5, 1)
stop_date = DateTime(2010, 7, 1)
Δt = 450.0; # seconds

Create the computational domain and load the necessary forcing data for the land surface model. ClimaLand includes the domain, forcing, and LAI as part of the model, but here we will need to recreate a model many times (for each parameter value tried), while the domain, forcing, and LAI are held fixed. Because of that, we define them here, once, outside of the function that is called to make the model.

Create a column domain representing a 2-meter deep soil column with 10 vertical layers.

zmin = FT(-2)  # 2m depth
zmax = FT(0)   # surface
domain = Column(; zlim = (zmin, zmax), nelements = 10, longlat = (long, lat));

Load prescribed atmospheric and radiative forcing from FLUXNET data

forcing = FluxnetSimulations.prescribed_forcing_fluxnet(
    site_ID,
    lat,
    long,
    time_offset,
    atmos_h,
    start_date,
    toml_dict,
    FT,
);

Get Leaf Area Index (LAI) data from MODIS satellite observations.

LAI = ClimaLand.Canopy.prescribed_lai_modis(
    domain.space.surface,
    start_date,
    stop_date,
);

Model Setup

Create an integrated land model that couples canopy, snow, soil, and soil CO2 components. This comprehensive model allows us to simulate the full land surface system and its interactions.

function model(Vcmax25, g1)
    Vcmax25 = FT(Vcmax25)
    g1 = FT(g1)

    #md # Set up models; note: we are not using the default soil, which relies on global
    #md # maps of parameters to estimate the parameters at the site.
    #md # Instead we use parameter more tailored to this site.
    prognostic_land_components = (:canopy, :snow, :soil, :soilco2)
    #md # Create soil model
    retention_parameters = (;
        ν = FT(0.55),
        θ_r = FT(0.04),
        K_sat = FT(4e-7),
        hydrology_cm = ClimaLand.Soil.vanGenuchten{FT}(;
            α = FT(0.05),
            n = FT(2.0),
        ),
    )
    composition_parameters =
        (; ν_ss_om = FT(0.1), ν_ss_quartz = FT(0.1), ν_ss_gravel = FT(0.0))
    soil = ClimaLand.Soil.EnergyHydrology{FT}(
        domain,
        forcing,
        toml_dict;
        prognostic_land_components,
        additional_sources = (ClimaLand.RootExtraction{FT}(),),
        runoff = ClimaLand.Soil.Runoff.SurfaceRunoff(),
        retention_parameters,
        composition_parameters,
        S_s = FT(1e-2),
    )
    surface_domain = ClimaLand.Domains.obtain_surface_domain(domain)
    canopy_forcing = (;
        atmos = forcing.atmos,
        radiation = forcing.radiation,
        ground = ClimaLand.PrognosticGroundConditions{FT}(),
    )
    #md # Set up photosynthesis using the Farquhar model
    photosyn_defaults =
        Canopy.clm_photosynthesis_parameters(surface_domain.space.surface)
    photosynthesis = Canopy.FarquharModel{FT}(
        surface_domain,
        toml_dict;
        photosynthesis_parameters = (;
            is_c3 = photosyn_defaults.is_c3,
            Vcmax25,
        ),
    )
    #md # Set up stomatal conductance using the Medlyn model
    conductance =
        Canopy.MedlynConductanceModel{FT}(surface_domain, toml_dict; g1)

    #md # Create canopy model
    canopy = Canopy.CanopyModel{FT}(
        surface_domain,
        canopy_forcing,
        LAI,
        toml_dict;
        photosynthesis,
        conductance,
        prognostic_land_components,
    )

    #md # Create integrated land model
    land_model = LandModel{FT}(
        forcing,
        LAI,
        toml_dict,
        domain,
        Δt;
        prognostic_land_components,
        soil,
        canopy,
    )

    #md # Set initial conditions from FLUXNET data
    set_ic! = FluxnetSimulations.make_set_fluxnet_initial_conditions(
        site_ID,
        start_date,
        time_offset,
        land_model,
    )

    #md # Configure diagnostics to output sensible and latent heat fluxes half-hourly
    #md # Since fluxnet data is recorded every half hour, with the average of the half hour,
    #md # we need to be consistent here.
    output_vars = ["lhf"]
    diagnostics = ClimaLand.default_diagnostics(
        land_model,
        start_date;
        output_writer = ClimaDiagnostics.Writers.DictWriter(),
        output_vars,
        reduction_period = :halfhourly,
    )

    #md # Create and run the simulation
    simulation = Simulations.LandSimulation(
        start_date,
        stop_date,
        Δt,
        land_model;
        set_ic!,
        updateat = Second(Δt),
        user_callbacks = (),
        diagnostics,
    )
    solve!(simulation)
    return simulation
end;

Observation and Helper Functions

Define the observation function G that maps from parameter space to observation space, along with supporting functions for data processing:

This function runs the model and computes diurnal average of latent heat flux

function G(Vcmax25, g1)
    simulation = model(Vcmax25, g1)
    lhf = get_lhf(simulation)
    observation =
        Float64.(
            get_diurnal_average(
                lhf,
                simulation.start_date,
                simulation.start_date + Day(20),
                stop_date,
            ),
        )
    return observation
end;

Helper function: Extract latent heat flux from simulation diagnostics

function get_lhf(simulation)
    return ClimaLand.Diagnostics.diagnostic_as_vectors(
        simulation.diagnostics[1].output_writer,
        "lhf_30m_average",
    )
end;

Helper function: Compute diurnal average of a variable

function get_diurnal_average(var, start_date, spinup_date, stop_date)
    (times, data) = var
    model_dates = if times isa Vector{DateTime}
        times
    else
        date.(times)
    end
    spinup_idx = findfirst(spinup_date .<= model_dates)
    stop_idx = findlast(model_dates .< stop_date)
    model_dates = model_dates[spinup_idx:stop_idx]
    data = data[spinup_idx:stop_idx]

    hour_of_day = Hour.(model_dates)
    mean_by_hour = [mean(data[hour_of_day .== Hour(i)]) for i in 0:23]
    return mean_by_hour
end;

Experiment Setup

We obtain observations from the FLUXNET site. The dataset contains multiple variables, but we will just use latent heat flux in this calibration.

dataset = FluxnetSimulations.get_comparison_data(site_ID, time_offset)
observations = get_diurnal_average(
    (dataset.UTC_datetime, dataset.lhf),
    start_date,
    start_date + Day(20),
    stop_date,
);

Define observation noise covariance for the ensemble Kalman process. A flat covariance matrix is used here for simplicity.

noise_covariance = 0.05 * EKP.I;

Prior Distribution and Calibration Configuration

Set up the prior distribution for the parameter and configure the ensemble Kalman inversion:

Constrained Gaussian prior for Vcmax25 with bounds [0, 1e-3], and g1 with bounds [0, 1000]. The first two arguments of each prior are the mean and standard deviation of the Gaussian function.

priors = [
    PD.constrained_gaussian("Vcmax25", 1e-4, 5e-5, 0, 1e-3),
    PD.constrained_gaussian("g1", 150, 90, 0, 1000),
]
prior = PD.combine_distributions(priors);

Set the ensemble size and number of iterations

ensemble_size = 10
N_iterations = 4;

Ensemble Kalman Inversion

Initialize and run the ensemble Kalman process:

Sample the initial parameter ensemble from the prior distribution

initial_ensemble = EKP.construct_initial_ensemble(rng, prior, ensemble_size)

ensemble_kalman_process = EKP.EnsembleKalmanProcess(
    initial_ensemble,
    observations,
    noise_covariance,
    EKP.Inversion();
    scheduler = EKP.DataMisfitController(
        terminate_at = Inf,
        on_terminate = "continue",
    ),
    rng,
);

┌ Warning: For 2 parameters, the recommended minimum ensemble size (`N_ens`) is 20. Got `N_ens` = 10`.
└ @ EnsembleKalmanProcesses ~/.julia/packages/EnsembleKalmanProcesses/MwCAA/src/EnsembleKalmanProcess.jl:266

Run the ensemble of forward models to iteratively update the parameter ensemble

Logging.with_logger(SimpleLogger(devnull, Logging.Error)) do
    for i in 1:N_iterations
        println("Iteration $i")
        params_i = EKP.get_ϕ_final(prior, ensemble_kalman_process)
        G_ens = hcat([G(params_i[:, j]...) for j in 1:ensemble_size]...)
        EKP.update_ensemble!(ensemble_kalman_process, G_ens)
    end
end;

Iteration 1
Iteration 2
Iteration 3
Iteration 4

Results Analysis and Visualization

Get the mean of the final parameter ensemble:

EKP.get_ϕ_mean_final(prior, ensemble_kalman_process);

Now, let's analyze the calibration results by examining parameter evolution and comparing model outputs across iterations.

Plot the parameter ensemble evolution over iterations to visualize convergence:

dim_size = sum(length.(EKP.batch(prior)))
fig = CairoMakie.Figure(size = ((dim_size + 1) * 500, 500))

for i in 1:dim_size
    EKP.Visualize.plot_ϕ_over_iters(
        fig[1, i],
        ensemble_kalman_process,
        prior,
        i,
    )
end

EKP.Visualize.plot_error_over_iters(
    fig[1, dim_size + 1],
    ensemble_kalman_process,
)
CairoMakie.save("obs_constrained_params_and_error.png", fig);

Compare the model output between the first and last iterations to assess improvement:

fig = CairoMakie.Figure(size = (900, 400))

first_G_ensemble = EKP.get_g(ensemble_kalman_process, 1)
last_iter = EKP.get_N_iterations(ensemble_kalman_process)
last_G_ensemble = EKP.get_g(ensemble_kalman_process, last_iter)
n_ens = EKP.get_N_ens(ensemble_kalman_process)

ax = Axis(
    fig[1, 1];
    title = "G ensemble: first vs last iteration (n = $(n_ens), iters 1 vs $(last_iter))",
    xlabel = "Observation index",
    ylabel = "G",
)

Axis with 0 plots:

Plot model output of first vs last iteration ensemble

for g in eachcol(first_G_ensemble)
    lines!(ax, 1:length(g), g; color = (:red, 0.6), linewidth = 1.5)
end

for g in eachcol(last_G_ensemble)
    lines!(ax, 1:length(g), g; color = (:blue, 0.6), linewidth = 1.5)
end

lines!(
    ax,
    1:length(observations),
    observations;
    color = (:black, 0.6),
    linewidth = 3,
)

axislegend(
    ax,
    [
        LineElement(color = :red, linewidth = 2),
        LineElement(color = :blue, linewidth = 2),
        LineElement(color = :black, linewidth = 4),
    ],
    ["First ensemble", "Last ensemble", "Observations"];
    position = :rb,
    framevisible = false,
)

CairoMakie.resize_to_layout!(fig)
CairoMakie.save("obs_G_first_and_last.png", fig);

Here we can see that the calibration has improved the fit to the data; but this it not the fully story. If other parameters have been set to unrealistic values, the parameters here may be compensating to achieve a lower loss. Moreover, we have only calibrated with two months of data. Production-level calibration runs require careful choice of free parameters and target observations.

This page was generated using Literate.jl.