SpaceVaringInputs and TimeVaryingInputs

Most models require external inputs to work. Examples of inputs are an analytic function that prescribes the sea-surface temperature in time, or a file that describes the types of plants on the surface of the globe. The SpaceVaringInputs and TimeVaryingInputs modules provide a unified infrastructure to handle all these cases.

TimeVaryingInputs

This extension is loaded when loading ClimaCore is loaded. In addition to this, if NetCDF files are used, NCDatasets has to be loaded too. Finally, a Regridder is needed (which might require importing additional packages).

A TimeVaryingInput is an object that knows how to fill a ClimaCore Field at a given simulation time t. TimeVaryingInputs can be constructed in a variety of ways, from using analytic functions, to NetCDF data. They expose one interface, evaluate!(dest_field, tv, time), which can be used by model developers to update their Fields.

This example shows that TimeVaryingInput can take different types of inputs and be used with a single interface (evaluate!). In all of this, TimeVaryingInputs internally handle all the complexity related to reading files (using FileReaders), dealing with parallelism and GPUs, regridding onto the computational domains (using Regridders and DataHandling), and so on.

TimeVaryingInputs support:

  • analytic functions of time;
  • pairs of 1D arrays (e.g., for PointSpaces or constant fields);
  • 2/3D NetCDF files (including composing multiple variables from one or more files into one variable);
  • linear interpolation in time (default), nearest neighbors, and "period filling";
  • boundary conditions and repeating periodic data.

It is possible to pass down keyword arguments to underlying constructors in the Regridder with the regridder_kwargs and file_reader_kwargs. These have to be a named tuple or a dictionary that maps Symbols to values.

NetCDF file inputs

2D or 3D NetCDF files can be provided as inputs using TimeVaryingInputs. This could be a single variable provided in a single file, multiple variables provided in a single file, or multiple variables each coming from a unique file. When using multiple variables, a composing function must be provided as well, which will be used to combine the input variables into one data variable that is ultimately stored in the TimeVaryingInput. In this case, the order of variables provided in varnames determines the order of the arguments passed to the composing function.

Note that if a non-identity pre-processing function is provided as part of file_reader_kwargs, it will be applied to each input variable before they are composed. All input variables to be composed together must have the same spatial and temporal dimensions.

Composing multiple input variables is currently only supported with the InterpolationsRegridder, not with TempestRegridder. The regridding is applied after the pre-processing and composing.

Composing multiple input variables in one Input is also possible with a SpaceVaryingInput, and everything mentioned here applies in that case.

Example: NetCDF file input with multiple input variables

Suppose that the input NetCDF file era5_example.nc contains two variables u and v, and we care about their sum u + v but not their individual values. We can provide a pointwise composing function to perform the sum, along with the InterpolationsRegridder to produce the data we want, u + v. The preprocess_func passed in file_reader_kwargs will be applied to u and to v individually, before the composing function is applied. The regridding is applied after the composing function. u and v could also come from separate NetCDF files, but they must still have the same spatial and temporal dimensions.

# Define the pointwise composing function we want, a simple sum in this case
compose_function = (x, y) -> x + y
# Define pre-processing function to convert units of input
unit_conversion_func = (data) -> 1000 * data

timevaryinginput = TimeVaryingInputs.TimeVaryingInput("era5_example.nc",
                                        ["u", "v"],
                                        target_space,
                                        start_date = Dates.DateTime(2000, 1, 1),
                                        regridder_type = :InterpolationsRegridder,
                                        file_reader_kwargs = (; preprocess_func = unit_conversion_func),
                                        compose_function)

The same arguments (excluding start_date) could be passed to a SpaceVaryingInput to compose multiple input variables with that type.

Example: Data split across multiple NetCDF files

Often, large datasets come chunked, meaning that the data is split across multiple files with each file containing only a subset of the time interval. TimeVaryingInputs know to combine data across multiple files as it were provided in a single file. To do use this feature, just pass the list of file paths. While it is not required for the files to be in order, it is good practice to pass them in ascending order by time.

For example:

timevaryinginput = TimeVaryingInputs.TimeVaryingInput(["era5_1980.nc", "era5_1981.nc"],
                                                       "u",
                                                       target_space,
                                                       start_date = Dates.DateTime(1980, 1, 1),
                                                       regridder_type = :InterpolationsRegridder
                                                       )

This capability is only available for the InterpolationsRegridder.

Read more about this feature in the page about DataHandler.

Extrapolation boundary conditions

TimeVaryingInputs can have multiple boundary conditions for extrapolation. By default, the Throw condition is used, meaning that interpolating onto a point that is outside the range of definition of the data is not allowed. Other boundary conditions are allowed. With the Flat boundary condition, when interpolating outside of the range of definition, return the value of the of closest boundary is used instead.

To set these boundary conditions, construct the relevant method passing the argument. For example, to combine NearestNeighbor with Flat:

import ClimaUtilities: TimeVaryingInputs

method = TimeVaryingInputs.NearestNeighbor(TimeVaryingInputs.Flat())

A boundary condition that is often useful is PeriodicCalendar, which repeats the data over and over.

In general PeriodicCalendar takes two inputs: the period and repeat_date. The repeat period is a Dates.DatePeriod (e.g., Dates.Year(1)) that defines the duration of the period that has to be repeated. The repeat_date defines what date range needs to be repeated. For example, if period = Dates.Month(1) and repeat_date = Dates.Date(1993, 11), November 1993 will be repeated.

The two inputs are not required. When they are not provided, ClimaUtilities will assume that the input data constitutes one period and use that. For example, if the data is defined from t0 to t1 (e.g., 1 and 5), interpolating over t > t1 (e.g., 7) is equivalent to interpolating to t* where t* is the modulus of t and the range (3 in this case). In this case, PeriodicCalendar requires the data to be uniformly spaced in time. To enable this boundary condition, pass LinearInterpolation(PeriodicCalendar()) to the TimeVaryingInput (or NearestNeighbor(PeriodicCalendar())).

Note

This PeriodicCalendar is different from what you might be used to, where the identification is t1 = t0. Here, we identify t1 + dt = t0. This is so that we can use it to repeat calendar data.

LinearPeriodFillingInterpolation

Often, data is not available at the frequency we would like it to be. For example, we might have hourly data for a given quantity but only on the 15th of the month. Performing linear interpolation with data with this type of gap is typically not accurate. Consider the example of a quantity with a diurnal cycle but measured only once a month. If we were to blindly perform linear interpolation, we would find that the diurnal cycle is completely removed for every day of the month but the 15th. This is because we would interpolate the last point for the day of a given month, with the first for the following.

LinearPeriodFillingInterpolation is an interpolation method that solves this problem by preserving periodic structures. This is accomplished by performing linear interpolation across corresponding periods (in the case of the day, across corresponding hours of different days). For more information, please refer to the docstring.

Example

Let target_space be the computational domain (a ClimaCore Space) and cesm_albedo.nc a NetCDF file containing albedo data as a function of time in a variable named alb.

import ClimaUtilities: TimeVaryingInputs
import ClimaCore
import NCDatasets
import ClimaCoreTempestRemap
# Loading ClimaCore, NCDatasets, ClimaCoreTempestRemap loads the extensions we need

function evolve_model(albedo_tv, albedo_field)
    new_t = t + dt
    # First, we update the albedo to the new time
    evaluate!(albedo_field, albedo_tv, new_t)
    # Now we can do all the operations we want we albedo_filed
    # rhs = ...
end

# Let us prepare an empty Field that will contain the albedo
albedo_field = zero(target_space)

# If the albedo is an analytic function of time
albedo_tv_an = TimeVaryingInput((t) -> 0.5)

# If the albedo comes from data

# start_date is the calendar date at the beginning of our simulation
start_date = Dates.DateTime(2000, 1, 1)
albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_space;
                                               start_date, regridder_kwargs = (; regrid_dir = "/tmp"))
# When using data from files, the data is automatically interpolated to the correct
# time

# In either cases, we can always call evolve_model(albedo_tv, albedo_field), so
# model developers do not have to worry about anything :)

As seen in this example, Inputs can take keyword arguments and pass them down to other constructors. This often used to preprocess files that are being read (most commonly to change units). For example, if we want to multiply the albedo by a factor of 100, we would change albedo_tv with

albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_space;
                                               start_date, regridder_kwargs = (; regrid_dir = "/tmp"),
                                               file_reader_kwargs = (; preprocess_func = (x) -> 100x))
Note

In this example we used the TempestRegridder. This is not the best choice in most cases because the TempestRegridder is slower, and not well-compatible with MPI and GPUs (ClimaUtilities implements workarounds for this, so the code would still work). InterpolationsRegridder should be preferred, unless there is a strict requirement of conservation: while TempestRegridder is guaranteed to conserve various properties, InterpolationsRegridder is not.

SpaceVaryingInputs

This extension is loaded when loading ClimaCore is loaded. In addition to this, if NetCDF files are used, NCDatasets has to be loaded too. Finally, a Regridder is needed (which might require importing additional packages).

SpaceVaryingInputs uses the same building blocks as TimeVaryingInput (chiefly the DataHandling datahandling_module) to construct a Field from different sources.

SpaceVaryingInputs support:

  • analytic functions of coordinates;
  • pairs of 1D arrays (for columns);
  • 2/3D NetCDF files (including composing multiple variables from one or more files into one variable).

In some ways, a SpaceVaryingInput can be thought as an alternative constructor for a ClimaCore Field.

It is possible to pass down keyword arguments to underlying constructors in the Regridder with the regridder_kwargs and file_reader_kwargs. These have to be a named tuple or a dictionary that maps Symbols to values.

SpaceVaryingInputs support reading individual input variables from NetCDF files, as well as composing multiple input variables into one SpaceVaryingInput. See the TimeVaryingInput "NetCDF file inputs" section for more information about this feature.

Example

Let target_space be a ClimaCore Space where we want the Field to be defined on and cesm_albedo.nc a NetCDF file containing albedo data as a time in a variable named alb.

import ClimaUtilities: SpaceVaryingInputs
import ClimaCore
import NCDatasets
import ClimaCoreTempestRemap
# Loading ClimaCore, NCDatasets, ClimaCoreTempestRemap loads the extensions we need

# Albedo as an analytic function of lat and lon
albedo_latlon_fun = (coord) -> 0.5 * coord.long * coord.lat

albedo = SpaceVaryingInputs.SpaceVaryingInput(albedo_latlon_fun, target_space)

albedo_from_file = SpaceVaryingInputs.SpaceVaryingInput("cesm_albedo.nc", "alb", target_space, regridder_kwargs = (; regrid_dir = "/tmp"))

API

ClimaUtilities.SpaceVaryingInputs.SpaceVaryingInputFunction
SpaceVaryingInput(data_function::Function, space::ClimaCore.Spaces.AbstractSpace)

Returns the parameter field to be used in the model; appropriate when a parameter is defined using a function of the coordinates of the space.

Pass the `data" as a functiondata_function` which takes coordinates as arguments, and the ClimaCore space of the model simulation.

This returns a scalar field. Note that data_function is broadcasted over the coordinate field. Internally, inside your function, this must be unpacked (coords.lat, coords.lon, e.g.) for use of the coordinate values directly.

source
function SpaceVaryingInput(
    data_z::AbstractArray,
    data_values::AbstractArray,
    space::S,
) where {S <: ClimaCore.Spaces.CenterFiniteDifferenceSpace}

Given a set of depths data_z and the observed values data_values at those depths, create an interpolated field of values at each value of z in the model grid - defined implicitly by space.

Returns a ClimaCore.Fields.Field of scalars.

source
SpaceVaryingInputs.SpaceVaryingInput(
    data_z::AbstractArray,
    data_values::NamedTuple,
    space::S,
    dest_type::Type{DT},
) where {
    S <: ClimaCore.Spaces.CenterFiniteDifferenceSpace,
    DT,
}

Returns a field of parameter structs to be used in the model; appropriate when the parameter struct values vary in depth; the dest_type argument is the struct type - we assumed that your struct as a constructor which accepts the values of its arguments by kwarg,

  • data_z is where the measured values were obtained,
  • data_values is a NamedTuple with keys equal to the argument names

of the struct, and with values equal to an array of measured values,

  • space defines the model grid.

As an example, we can create a field of vanGenuchten structs as follows. This struct requires two parameters, α and n. Let's assume that we have measurements of these as a function of depth at the locations given by data_z, called data_α and data_n. Then we can write vG_field = SpaceVaryingInput(data_z, (;α = data_α, n = data_n), space, vanGenuchten{Float32}). Under the hood, at each point in the model grid, we will create vanGenuchten{Float32}(;α = interp_α, n = interp_n), where interp indicates the interpolated value at the model depth.

Returns a ClimaCore.Fields.Field of type DT.

source
SpaceVaryingInput(data_handler::DataHandler)
SpaceVaryingInput(file_paths::Union{AbstractString, AbstractArray{String}},
                  varnames::Union{AbstractString, AbstractArray{String}},
                  target_space::Spaces.AbstractSpace;
                  regridder_type::Symbol,
                  regridder_kwargs = (),
                  file_reader_kwargs = ())

Returns the parameter field to be used in the model; appropriate when a parameter is defined on the surface of the Earth.

Returns a ClimaCore.Fields.Field of scalars; analogous to the 1D case which also returns a ClimaCore.Fields.Field of scalars.

source
ClimaUtilities.TimeVaryingInputs.AbstractInterpolationMethodType
AbstractInterpolationMethod

Defines how to perform interpolation.

Not all the TimeVaryingInputs support all the interpolation methods (e.g., no interpolation methods are supported when the given function is analytic).

AbstractInterpolationMethods have to implement a extrapolation_bc field.

source
ClimaUtilities.TimeVaryingInputs.NearestNeighborType
NearestNeighbor(extrapolation_bc::AbstractInterpolationBoundaryMethod)

Return the value corresponding to the point closest to the input time.

extrapolation_bc specifies how to deal with out of boundary values. The default value is Throw.

source
ClimaUtilities.TimeVaryingInputs.LinearInterpolationType
LinearInterpolation(extrapolation_bc::AbstractInterpolationBoundaryMethod)

Perform linear interpolation between the two neighboring points.

extrapolation_bc specifies how to deal with out of boundary values. The default value is Throw.

source
ClimaUtilities.TimeVaryingInputs.PeriodicCalendarType
PeriodicCalendar

Repeat data periodically.

PeriodicCalendar has two modes of operation:

First, when provided with a period (described a DatePeriod, e.g., Dates.Month(1) or Dates.Year(1)), assume that the provided data is repeated over that calendar period. A date can be passed too, indicating what data to use. Only simple periods (e.g., Dates.Month(1)) are supported. When provided a period, a repeat_date is required too. This is the period of time that is repeated. For example, if period = Dates.Month(1) and repeat_date = Dates.Date(1993, 11), November 1993 is repeated (if available in the input data).

Note

Passing a period is not supported by all the interpolators (e.g., when reading from 1D files).

Second, if no period is provided, when interpolating outside of range, restart from the beginning.

For example, if the data is defined from t0 = 0 to t1 = 10, extrapolating at t=13 is equivalent to interpolating at t=2. In practice, we identify t1 + dt to be t0 again. This is different from what you might be used to for periodic boundary conditions, where the identification is t1 = t0.

This second mode of operation PeriodicCalendar requires data to be uniformly sampled in time.

If the data is defined on a calendar year, this second mode of operation is equivalent to using the first mode with period = Dates.Year (same with other periods).

source
ClimaUtilities.TimeVaryingInputs.FlatType
Flat

When interpolating outside of range, use the boundary value.

For example, if the data is defined from t0 = 0 to t1 = 10, extrapolating at t=13 returns the value at t1 = 10. When interpolating at t=-3, use t0 = 0.

source
ClimaUtilities.TimeVaryingInputs.evaluate!Function
evaluate!(dest, input, time, args...; kwargs...)

Evaluate the input at the given time, writing the output in-place to dest.

Depending on the details of input, this function might do I/O and communication.

Extra arguments

args and kwargs are used only when the input is a non-interpolating function, e.g., an analytic one. In that case, args and kwargs are passed down to the function itself.

source
Base.inFunction
in(time, itp::InterpolatingTimeVaryingInput23D)

Check if the given time is in the range of definition for itp.

source
in(time, itp::InterpolatingTimeVaryingInput23D)

Check if the given time is in the range of definition for itp.

source
Base.closeMethod
close(time_varying_input::TimeVaryingInputs.AbstractTimeVaryingInput)

Close files associated to the time_varying_input.

source