Using ClimaLand Diagnostics to save simulation output

When running a ClimaLand simulations, you have multiple options on how to write the outputs of that simulation. You may want all variables, or just a selected few. You may want instantaneous values, at the highest temporal and spatial resolution, or you may want to get averages at hourly or monthly time scale, and integrate in space (for example soil moisture from 0 to 1 meter depth). You may want to get more specific reductions, such as 10 days maximums, or compute a new variables as a function of others. You may want to get your outputs in memory in a Julia dictionary, or write them in a NetCDF file.

This is where ClimaLand Diagnostics comes in for users.

In this documentation page, we first explain how to use default diagnostics and what are the defaults, and then explain how to define your own diagnostics for more advanced users.

Default Diagnostics

Diagnostics refer to output saved during the simulation, which may be prognostic or diagnostic variables. Note that this is different from checkpointing the simulation, which has specific requirements. For information about checkpointing and restarting simulations, please see the page titled Restarting Simulations.

The main user-facing function in the ClimaLand.Diagnostics module is default_diagnostics. This function defines what diagnostic variables to compute by default for a specific model, and on what schedule (for example, hourly average).

default_diagnostics takes in the following arguments:

  • model: The ClimaLand model to generate diagnostics for. Currently the following models support diagnostics: CanopyModel, EnergyHydrologyModel, SoilCanopyModel, LandModel, BucketModel
  • start_date: The start date of the simulation.
  • output_writer: An object of type ClimaDiagnostics.AbstractWriter. Specifically this may be a NetCDFWriter, HDF5Writer, or DictWriter, which save output to a NetCDF file, HDF5 file, or dictionary in Julia memory, respectively. For more details about the diagnostics writers, please see the ClimaDiagnostics.jl documentation.
  • output_vars: This argument may be :short to output a default list of variables defined for each model, :long to output all

available variables for the model, or a user-defined list of variable "short names".

  • reduction_period: How often to compute and save the provided temporal reduction of the diagnostics.
  • reduction_type: A temporal reduction to apply to the diagnostics over each provided time period, e.g. :average, :max, :min, :instantaneous (no reduction)
  • dt: Simulation timestep; only required for instantaneous diagnostics.

Example: diagnostics for a CanopyModel

The following code sets up default short diagnostics to be averaged hourly and written in memory as a Julia dictionary:

diag_writer = ClimaDiagnostics.Writers.DictWriter();
diagnostics = ClimaLand.Diagnostics.default_diagnostics(
    canopy,
    start_date;
    output_vars = :short,
    output_writer = diag_writer,
    reduction_period = :hourly,
    reduction_type = :average,
);

To instead output a list of specific diagnostics, you can change the value of output_vars. For example, to output gross primary productivity (GPP) and transpiration you would do the following:

diag_writer = ClimaDiagnostics.Writers.DictWriter();
diagnostics = ClimaLand.Diagnostics.default_diagnostics(
    canopy,
    start_date;
    output_vars = ["gpp", "trans"],
    output_writer = diag_writer,
    reduction_period = :hourly,
    reduction_type = :average,
);

A description of available diagnostics and their short names can be found on the Available diagnostic variables documentation page.

To write the diagnostics to a NetCDF file instead of saving it in memory, the diag_writer should be constructed as a NetCDFWriter and then passed to default_diagnostics as before:

outdir = "my_output_dir"
nc_writer =
    ClimaDiagnostics.Writers.NetCDFWriter(space, outdir; start_date)
Note

The NetCDFWriter currently writes to file for each diagnostic output, which can be quite slow when saving variables at every step. On the other hand, the DictWriter saves output to memory which may be too large in global runs, so DictWriter usually should not be used for global runs. In general, we recommend using DictWriter for column simulations, and NetCDFWriter for global simulations.

Diagnostics output naming and format

Diagnostics are typically named using the format $(short_name)_$(period)_$(reduction). For example, with the NetCDFWriter, hourly-averaged GPP would be saved in an output file titled gpp_1h_average.nc.

The specific output format depends on which output writer is being used; for more details, please see the ClimaDiagnostics documentation.

Adding new diagnostics

To define a new, custom diagnostic, you must follow these steps:

  • specify how to compute the diagnostic
  • manually define the diagnostic via add_diagnostic_variable!
  • add the diagnostic to the list of possible diagnostics for the relevant model(s)

Define how to compute your diagnostic variable from your model state and cache.

These functions are defined in src/diagnostics/land_compute_methods.jl and must be named in the format compute_[standard_name]!. Be sure to write method(s) for each model you want this diagnostic to be available for. For example, let's say you want the bowen ratio (ratio between sensible heat and latent heat) in the Bucket model.

function compute_bowen_ratio!(out, Y, p, t, land_model::BucketModel)
    if isnothing(out)
        return copy(p.bucket.turbulent_fluxes.shf / p.bucket.turbulent_fluxes.lhf)
    else
        out .= p.bucket.turbulent_fluxes.shf / p.bucket.turbulent_fluxes.lhf
    end
end

Or, for convenience, you can use the @diagnostic_compute macro which generates the same function. However, it is better to use that macro only if you are retrieving an existing variable, such as latent heat flux, rather than computing one, like the Bowen ratio above. For example,

@diagnostic_compute "latent_heat_flux" BucketModel p.bucket.turbulent_fluxes.lhf

Add that diagnostic(s) variable to your list of variables

These functions are defined in src/diagnostics/define_diagnostics.jl.

 add_diagnostic_variable!(
    short_name = "bor",
    long_name = "Bowen ratio",
    standard_name = "bowen_ratio",
    units = "",
    comments = "Ratio of sensible to latent heat flux.",
    compute! = (out, Y, p, t) -> compute_bowen_ratio!(out, Y, p, t, land_model),
)

add_diagnostic_variable!(
    short_name = "lhf",
    long_name = "Latent Heat Flux",
    standard_name = "latent_heat_flux",
    units = "W m^-2",
    comments = "Exchange of energy at the land-atmosphere interface due to water evaporation or sublimation.",
    compute! = (out, Y, p, t) ->
    compute_latent_heat_flux!(out, Y, p, t, land_model),
)

Define how often to output your variables, and how to combine them.

We support a range of reduction periods and reduction types, defined in src/diagnostics/default_diagnostics.jl, which should be passed to default_diagnostics as Symbols. Currently, the following periods are supported:

  • :every_dt (requires dt to be provided separately)
  • :halfhourly
  • :hourly
  • :daily
  • :tendaily
  • :monthly

And the following reduction types are supported:

  • :instantaneous
  • :average (requires pre_output_hook! = average_pre_output_hook! when creating the ScheduledDiagnostic)
  • :max
  • :min

You can also define your own output frequencies and reductions. For example, if you want the seasonal maximum of your variables, where season is defined as 90 days, you could add the following get_period function, and then provide this new reduction period to default_diagnostics.

get_period(:seasonal) = Day(90)
default_diagnostics(model, start_date, output_writer; output_vars, reduction_period = :seasonal, reduction_type = :max)

Analyze your simulation output

Once you've run your simulation and created an output folder (e.g., output_dir) with diagnostics, you can use ClimaAnalysis to access and analyze your data. For in depth documentation about ClimaAnalysis, see its documentation.

Here is an example of how to plot a variable:

import ClimaAnalysis

import ClimaAnalysis.Visualize as viz

import CairoMakie # the plotting package used by ClimaAnalysis

simdir = ClimaAnalysis.SimDir(output_dir) # where output_dir is where you saved your diagnostics.

var = get(simdir; "lhf") # assuming lhf, latent_heat_flux used as an example above, is one of your diagnostics variables.

fig = CairoMakie.Figure() # creates an empty figure object

viz.plot!(fig, var) # creates an axis inside fig, and plot your var in it.

CairoMakie.save(fig) # saves the figure in current working directory

API

ClimaLand.Diagnostics.default_diagnosticsFunction
default_diagnostics(model::AbstractModel{FT},
                    start_date;
                    output_writer,
                    output_vars = :short,
                    reduction_period = :monthly,
                    reduction_type = :average,
                    dt = nothing)

Construct a list of ScheduledDiagnostics that outputs the given variables at the specified average period.

The input output_vars can have 3 values:

  • :long - all diagnostics are output
  • :short - a short list of diagnostics is output
  • _::Vector{String} - a user-defined list of diagnostics is output

If a user-defined list is provided for output_vars, it must be a vector of strings that are valid short names of diagnostics for the model. Please see the method get_possible_diagnostics for the list of available diagnostics for each model.

reduction_period specifies the frequency at which to average the diagnostics, and reduction_type specifies the type of reduction to apply. Please see the docstring of get_period for the list of available periods, and the docstring of get_reduction for the list of available reduction types.

This method can be extended for any model that extends get_possible_diagnostics and get_short_diagnostics. Note that EnergyHydrology has a specialized method that handles conservation diagnostics.

source
default_diagnostics(
    land_model::EnergyHydrology{FT},
    start_date;
    output_writer,
    output_vars = :short,
    reduction_period = :monthly,
    reduction_type = :average,
    conservation = false,
    conservation_period = Day(10),
    dt = nothing,
) where {FT}

Define a method specific to the EnergyHydrology model so that we can handle conservation diagnostics specially.

The input output_vars can have 3 values:

  • :long - all diagnostics are output
  • :short - a short list of diagnostics is output
  • _::Vector{String} - a user-defined list of diagnostics is output

If a user-defined list is provided for output_vars, it must be a vector of strings that are valid short names of diagnostics for the model.

reduction_period specifies the frequency at which to average the diagnostics, and reduction_type specifies the type of reduction to apply. Please see the docstring of get_period for the list of available periods, and the docstring of get_reduction for the list of available reduction types.

Conservation diagnostics should not be provided as part of the output_vars argument, but rather included by providing conservation = true. Please see the method get_possible_diagnostics for the list of available diagnostics.

source
ClimaLand.Diagnostics.close_output_writersFunction
close_output_writers(diagnostics)

Close the output writers in the diagnostics, an iterable of ClimaDiagnostics.ScheduledDiagnostic or nothing.

This function should be called at the end of every simulation.

source
ClimaLand.total_energy_per_area!Function
ClimaLand.total_energy_per_area!(
    surface_field,
    model::EnergyHydrology,
    Y,
    p,
    t,
)

A function which updates surface_field in place with the value for the total energy per unit ground area for the EnergyHydrology.

source
ClimaLand.total_energy_per_area!(
    surface_field,
    model::SnowModel,
    Y,
    p,
    t,
)

A function which updates surface_field in place with the value for the total energy per unit ground area for the SnowModel.

This has already accounted for the area fraction of snow in the definition of S.

source
ClimaLand.total_energy_per_area!(
    surface_field,
    model::BucketModel,
    Y,
    p,
    t,
)

A function which updates surface_field in place with the value for the total energy per unit ground area for the BucketModel.

The ground of the bucket model has temperature Y.bucket.T, with volumetric specific heat approximated with the parameter ρc_soil. Additional energy is present due to the latent heat of fusion of frozen water, in the form of snow. We also add in this energy (below). We do not model or account for the sensible energy of snow (ρ_snow T_snow), as this is much smaller.

source
total_energy_per_area!(cache, model::AbstractModel, Y, p, t)

A function which updates cache in place with the total energy per unit ground area for the model, computed from Y, p, and t.

source
total_energy_per_area!(
    surface_field,
    land::AbstractLandModel,
    Y,
    p,
    t,
    sfc_cache,
)

A function which computes the total energy per unit area and updates surface_field in place, for the land model land, by calling the same function for the component models.

The sfc_cache field is available as scratch space.

source
ClimaLand.total_energy_per_area!(
    surface_field,
    model::BigLeafEnergyModel,
    Y,
    p,
    t,

)

A function which updates surface_field in place with the value of the big leaf model's energy.

Note that this assumes that there is at most a single canopy and stem component, and that the area index for them refers to the integrated area index (in height) - not the value per layer.

source
ClimaLand.total_energy_per_area!(
    surface_field,
    model::AbstractCanopyEnergyModel,
    Y,
    p,
    t,

)

A default function which errors for generic energy models for the canopy.

Note that we only support two models for canopy energy. The BigLeafEnergyModel has a special method for this, and the other has the temperature prescribed and does not conserve energy.

source
ClimaLand.total_energy_per_area!(
    surface_field,
    model::CanopyModel,
    Y,
    p,
    t,
)

A function which updates surface_field in place with the value for the total energy per unit ground area for the CanopyModel.

This acts by calling the method for the energy component of the canopy model.

source
ClimaLand.total_liq_water_vol_per_area!Function
ClimaLand.total_liq_water_vol_per_area!(
    surface_field,
    model::RichardsModel,
    Y,
    p,
    t,
)

A function which updates surface_field in place with the value for the total liquid water volume per unit ground area for the RichardsModel.

source
ClimaLand.total_liq_water_vol_per_area!(
    surface_field,
    model::EnergyHydrology,
    Y,
    p,
    t,
)

A function which updates surface_field in place with the value for the total liquid water volume per unit ground area for the EnergyHydrology.

The water in all phases is accounted for by converting ice volume to liquid water volume using the ratio of the density of ice to the density of water.

source
ClimaLand.total_liq_water_vol_per_area!(
    surface_field,
    model::SnowModel,
    Y,
    p,
    t,
)

A function which updates surface_field in place with the value for the total liquid water volume per unit ground area for the SnowModel.

This has already accounted for the area fraction of snow in the definition of S; it also accounts for both liquid and frozen water present in the snow, as the snow water equivalent is already the total liquid water volume present in the snow if all the snow melted, per unit ground area.

source
ClimaLand.total_liq_water_vol_per_area!(
    surface_field,
    model::BucketModel,
    Y,
    p,
    t,
)

A function which updates surface_field in place with the value for the total liquid water volume per unit ground area for the BucketModel, in [m³ m⁻²].

The total water contained in the bucket is the sum of the subsurface water storage W, the snow water equivalent σS and surface water content Ws.

source
total_liq_water_vol_per_area!(cache, model::AbstractModel, Y, p, t)

A function which updates cache in place with the total liquid water volume per unit ground area for the model, computed from Y, p, and t.

While water mass is the fundamentally conserved quantity, soil modelling represents water by an equivalent water volume using the density of water and ice at standard temperature and pressure. Because of that, we report here the total volume of water present in a model (per unit area) that would arise if all the water was in liquid phase. This can be converted to a mass using the density of liquid water.

This includes the water in multiple phases. For example, if ice is present, the water volume is computed using ratio of the density of ice to the density of liquid water.

source
total_liq_water_vol_per_area!(
    surface_field,
    land::AbstractLandModel,
    Y,
    p,
    t,
    sfc_cache,
)

A function which computes the total liquid water volume per unit area and updates surface_field in place, for the land model land, by calling the same function for the component models.

The sfc_cache field is available as scratch space.

source
ClimaLand.total_liq_water_vol_per_area!(
    surface_field,
    model::CanopyModel,
    Y,
    p,
    t,
)

A function which updates surface_field in place with the value for the total liquid water volume per unit ground area for the CanopyModel.

This acts by calling the method for the PlantHydraulics component of the canopy model.

source
ClimaLand.total_liq_water_vol_per_area!(
    surface_field,
    model::PlantHydraulicsModel,
    Y,
    p,
    t,

)

A function which updates surface_field in place with the value of the plant hydraulic models total water volume.

Note that this is general for any number of canopy layers, but it assumes that the LAI and SAI given are per layer. This is distinct from the BigLeaf approach, in which the LAI and SAI refer to the integrated area index with heigh.

source