SpaceVaringInputs
and TimeVaryingInputs
Most models require external inputs to work. Examples of inputs are an analytic function that prescribes the sea-surface temperature in time, or a file that describes the types of plants on the surface of the globe. The SpaceVaringInputs
and TimeVaryingInputs
modules provide a unified infrastructure to handle all these cases.
TimeVaryingInputs
This extension is loaded when loading
ClimaCore
is loaded. In addition to this, if NetCDF files are used,NCDatasets
has to be loaded too. Finally, aRegridder
is needed (which might require importing additional packages).
A TimeVaryingInput
is an object that knows how to fill a ClimaCore
Field
at a given simulation time t
. TimeVaryingInputs
can be constructed in a variety of ways, from using analytic functions, to NetCDF data. They expose one interface, evaluate!(dest_field, tv, time)
, which can be used by model developers to update their Field
s.
This example shows that TimeVaryingInput
can take different types of inputs and be used with a single interface (evaluate!
). In all of this, TimeVaryingInput
s internally handle all the complexity related to reading files (using FileReaders
), dealing with parallelism and GPUs, regridding onto the computational domains (using Regridders
and DataHandling
), and so on.
TimeVaryingInputs
support:
- analytic functions of time;
- pairs of 1D arrays (e.g., for
PointSpaces
or constant fields); - 2/3D NetCDF files (including composing multiple variables from one or more files into one variable);
- linear interpolation in time (default), nearest neighbors, and "period filling";
- boundary conditions and repeating periodic data.
It is possible to pass down keyword arguments to underlying constructors in the Regridder
with the regridder_kwargs
and file_reader_kwargs
. These have to be a named tuple or a dictionary that maps Symbol
s to values.
NetCDF file inputs
2D or 3D NetCDF files can be provided as inputs using TimeVaryingInputs
. This could be a single variable provided in a single file, multiple variables provided in a single file, or multiple variables each coming from a unique file. When using multiple variables, a composing function must be provided as well, which will be used to combine the input variables into one data variable that is ultimately stored in the TimeVaryingInput
. In this case, the order of variables provided in varnames
determines the order of the arguments passed to the composing function.
Note that if a non-identity pre-processing function is provided as part of file_reader_kwargs
, it will be applied to each input variable before they are composed. All input variables to be composed together must have the same spatial and temporal dimensions.
Composing multiple input variables is currently only supported with the InterpolationsRegridder
, not with TempestRegridder
. The regridding is applied after the pre-processing and composing.
Composing multiple input variables in one Input
is also possible with a SpaceVaryingInput
, and everything mentioned here applies in that case.
Example: NetCDF file input with multiple input variables
Suppose that the input NetCDF file era5_example.nc
contains two variables u
and v
, and we care about their sum u + v
but not their individual values. We can provide a pointwise composing function to perform the sum, along with the InterpolationsRegridder
to produce the data we want, u + v
. The preprocess_func
passed in file_reader_kwargs
will be applied to u
and to v
individually, before the composing function is applied. The regridding is applied after the composing function. u
and v
could also come from separate NetCDF files, but they must still have the same spatial and temporal dimensions.
# Define the pointwise composing function we want, a simple sum in this case
compose_function = (x, y) -> x + y
# Define pre-processing function to convert units of input
unit_conversion_func = (data) -> 1000 * data
timevaryinginput = TimeVaryingInputs.TimeVaryingInput("era5_example.nc",
["u", "v"],
target_space,
start_date = Dates.DateTime(2000, 1, 1),
regridder_type = :InterpolationsRegridder,
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
compose_function)
The same arguments (excluding start_date
) could be passed to a SpaceVaryingInput
to compose multiple input variables with that type.
Example: Data split across multiple NetCDF files
Often, large datasets come chunked, meaning that the data is split across multiple files with each file containing only a subset of the time interval. TimeVaryingInput
s know to combine data across multiple files as it were provided in a single file. To do use this feature, just pass the list of file paths. While it is not required for the files to be in order, it is good practice to pass them in ascending order by time.
For example:
timevaryinginput = TimeVaryingInputs.TimeVaryingInput(["era5_1980.nc", "era5_1981.nc"],
"u",
target_space,
start_date = Dates.DateTime(1980, 1, 1),
regridder_type = :InterpolationsRegridder
)
This capability is only available for the InterpolationsRegridder
.
Read more about this feature in the page about DataHandler
.
Extrapolation boundary conditions
TimeVaryingInput
s can have multiple boundary conditions for extrapolation. By default, the Throw
condition is used, meaning that interpolating onto a point that is outside the range of definition of the data is not allowed. Other boundary conditions are allowed. With the Flat
boundary condition, when interpolating outside of the range of definition, return the value of the of closest boundary is used instead.
To set these boundary conditions, construct the relevant method passing the argument. For example, to combine NearestNeighbor
with Flat
:
import ClimaUtilities: TimeVaryingInputs
method = TimeVaryingInputs.NearestNeighbor(TimeVaryingInputs.Flat())
A boundary condition that is often useful is PeriodicCalendar
, which repeats the data over and over.
In general PeriodicCalendar
takes two inputs: the period
and repeat_date
. The repeat period is a Dates.DatePeriod
(e.g., Dates.Year(1)
) that defines the duration of the period that has to be repeated. The repeat_date
defines what date range needs to be repeated. For example, if period = Dates.Month(1)
and repeat_date = Dates.Date(1993, 11)
, November 1993 will be repeated.
The two inputs are not required. When they are not provided, ClimaUtilities
will assume that the input data constitutes one period and use that. For example, if the data is defined from t0
to t1
(e.g., 1 and 5), interpolating over t > t1
(e.g., 7) is equivalent to interpolating to t*
where t*
is the modulus of t
and the range (3 in this case). In this case, PeriodicCalendar
requires the data to be uniformly spaced in time. To enable this boundary condition, pass LinearInterpolation(PeriodicCalendar())
to the TimeVaryingInput
(or NearestNeighbor(PeriodicCalendar())
).
This PeriodicCalendar
is different from what you might be used to, where the identification is t1 = t0
. Here, we identify t1 + dt = t0
. This is so that we can use it to repeat calendar data.
LinearPeriodFillingInterpolation
Often, data is not available at the frequency we would like it to be. For example, we might have hourly data for a given quantity but only on the 15th of the month. Performing linear interpolation with data with this type of gap is typically not accurate. Consider the example of a quantity with a diurnal cycle but measured only once a month. If we were to blindly perform linear interpolation, we would find that the diurnal cycle is completely removed for every day of the month but the 15th. This is because we would interpolate the last point for the day of a given month, with the first for the following.
LinearPeriodFillingInterpolation
is an interpolation method that solves this problem by preserving periodic structures. This is accomplished by performing linear interpolation across corresponding periods (in the case of the day, across corresponding hours of different days). For more information, please refer to the docstring.
Example
Let target_space
be the computational domain (a ClimaCore
Space
) and cesm_albedo.nc
a NetCDF file containing albedo data as a function of time in a variable named alb
.
import ClimaUtilities: TimeVaryingInputs
import ClimaCore
import NCDatasets
import ClimaCoreTempestRemap
# Loading ClimaCore, NCDatasets, ClimaCoreTempestRemap loads the extensions we need
function evolve_model(albedo_tv, albedo_field)
new_t = t + dt
# First, we update the albedo to the new time
evaluate!(albedo_field, albedo_tv, new_t)
# Now we can do all the operations we want we albedo_filed
# rhs = ...
end
# Let us prepare an empty Field that will contain the albedo
albedo_field = zero(target_space)
# If the albedo is an analytic function of time
albedo_tv_an = TimeVaryingInput((t) -> 0.5)
# If the albedo comes from data
# start_date is the calendar date at the beginning of our simulation
start_date = Dates.DateTime(2000, 1, 1)
albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_space;
start_date, regridder_kwargs = (; regrid_dir = "/tmp"))
# When using data from files, the data is automatically interpolated to the correct
# time
# In either cases, we can always call evolve_model(albedo_tv, albedo_field), so
# model developers do not have to worry about anything :)
As seen in this example, Inputs
can take keyword arguments and pass them down to other constructors. This often used to preprocess files that are being read (most commonly to change units). For example, if we want to multiply the albedo by a factor of 100, we would change albedo_tv
with
albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_space;
start_date, regridder_kwargs = (; regrid_dir = "/tmp"),
file_reader_kwargs = (; preprocess_func = (x) -> 100x))
In this example we used the TempestRegridder
. This is not the best choice in most cases because the TempestRegridder
is slower, and not well-compatible with MPI and GPUs (ClimaUtilities
implements workarounds for this, so the code would still work). InterpolationsRegridder
should be preferred, unless there is a strict requirement of conservation: while TempestRegridder
is guaranteed to conserve various properties, InterpolationsRegridder
is not.
SpaceVaryingInputs
This extension is loaded when loading
ClimaCore
is loaded. In addition to this, if NetCDF files are used,NCDatasets
has to be loaded too. Finally, aRegridder
is needed (which might require importing additional packages).
SpaceVaryingInput
s uses the same building blocks as TimeVaryingInput
(chiefly the DataHandling
datahandling_module) to construct a Field
from different sources.
SpaceVaryingInputs
support:
- analytic functions of coordinates;
- pairs of 1D arrays (for columns);
- 2/3D NetCDF files (including composing multiple variables from one or more files into one variable).
In some ways, a SpaceVaryingInput
can be thought as an alternative constructor for a ClimaCore
Field
.
It is possible to pass down keyword arguments to underlying constructors in the Regridder
with the regridder_kwargs
and file_reader_kwargs
. These have to be a named tuple or a dictionary that maps Symbol
s to values.
SpaceVaryingInputs
support reading individual input variables from NetCDF files, as well as composing multiple input variables into one SpaceVaryingInput
. See the TimeVaryingInput
"NetCDF file inputs" section for more information about this feature.
Example
Let target_space
be a ClimaCore
Space
where we want the Field
to be defined on and cesm_albedo.nc
a NetCDF file containing albedo data as a time in a variable named alb
.
import ClimaUtilities: SpaceVaryingInputs
import ClimaCore
import NCDatasets
import ClimaCoreTempestRemap
# Loading ClimaCore, NCDatasets, ClimaCoreTempestRemap loads the extensions we need
# Albedo as an analytic function of lat and lon
albedo_latlon_fun = (coord) -> 0.5 * coord.long * coord.lat
albedo = SpaceVaryingInputs.SpaceVaryingInput(albedo_latlon_fun, target_space)
albedo_from_file = SpaceVaryingInputs.SpaceVaryingInput("cesm_albedo.nc", "alb", target_space, regridder_kwargs = (; regrid_dir = "/tmp"))
API
ClimaUtilities.SpaceVaryingInputs.SpaceVaryingInput
— FunctionSpaceVaryingInput(data_function::Function, space::ClimaCore.Spaces.AbstractSpace)
Returns the parameter field to be used in the model; appropriate when a parameter is defined using a function of the coordinates of the space.
Pass the `data" as a function
data_function` which takes coordinates as arguments, and the ClimaCore space of the model simulation.
This returns a scalar field. Note that data_function is broadcasted over the coordinate field. Internally, inside your function, this must be unpacked (coords.lat, coords.lon, e.g.) for use of the coordinate values directly.
function SpaceVaryingInput(
data_z::AbstractArray,
data_values::AbstractArray,
space::S,
) where {S <: ClimaCore.Spaces.CenterFiniteDifferenceSpace}
Given a set of depths data_z
and the observed values data_values
at those depths, create an interpolated field of values at each value of z in the model grid - defined implicitly by space
.
Returns a ClimaCore.Fields.Field of scalars.
SpaceVaryingInputs.SpaceVaryingInput(
data_z::AbstractArray,
data_values::NamedTuple,
space::S,
dest_type::Type{DT},
) where {
S <: ClimaCore.Spaces.CenterFiniteDifferenceSpace,
DT,
}
Returns a field of parameter structs to be used in the model; appropriate when the parameter struct values vary in depth; the dest_type
argument is the struct type - we assumed that your struct as a constructor which accepts the values of its arguments by kwarg,
data_z
is where the measured values were obtained,data_values
is a NamedTuple with keys equal to the argument names
of the struct, and with values equal to an array of measured values,
space
defines the model grid.
As an example, we can create a field of vanGenuchten structs as follows. This struct requires two parameters, α
and n
. Let's assume that we have measurements of these as a function of depth at the locations given by data_z
, called data_α
and data_n
. Then we can write vG_field = SpaceVaryingInput(data_z, (;α = data_α, n = data_n), space, vanGenuchten{Float32})
. Under the hood, at each point in the model grid, we will create vanGenuchten{Float32}(;α = interp_α, n = interp_n)
, where interp
indicates the interpolated value at the model depth.
Returns a ClimaCore.Fields.Field of type DT.
SpaceVaryingInput(data_handler::DataHandler)
SpaceVaryingInput(file_paths::Union{AbstractString, AbstractArray{String}},
varnames::Union{AbstractString, AbstractArray{String}},
target_space::Spaces.AbstractSpace;
regridder_type::Symbol,
regridder_kwargs = (),
file_reader_kwargs = ())
Returns the parameter field to be used in the model; appropriate when a parameter is defined on the surface of the Earth.
Returns a ClimaCore.Fields.Field of scalars; analogous to the 1D case which also returns a ClimaCore.Fields.Field of scalars.
ClimaUtilities.TimeVaryingInputs.AbstractInterpolationMethod
— TypeAbstractInterpolationMethod
Defines how to perform interpolation.
Not all the TimeVaryingInputs support all the interpolation methods (e.g., no interpolation methods are supported when the given function is analytic).
AbstractInterpolationMethod
s have to implement a extrapolation_bc
field.
ClimaUtilities.TimeVaryingInputs.AbstractInterpolationBoundaryMethod
— TypeAbstractInterpolationBoundaryMethod
Defines how to handle values outside of the data boundary.
Not all the AbstractInterpolationMethod
support all the AbstractInterpolationBoundaryMethod
s.
ClimaUtilities.TimeVaryingInputs.NearestNeighbor
— TypeNearestNeighbor(extrapolation_bc::AbstractInterpolationBoundaryMethod)
Return the value corresponding to the point closest to the input time.
extrapolation_bc
specifies how to deal with out of boundary values. The default value is Throw
.
ClimaUtilities.TimeVaryingInputs.LinearInterpolation
— TypeLinearInterpolation(extrapolation_bc::AbstractInterpolationBoundaryMethod)
Perform linear interpolation between the two neighboring points.
extrapolation_bc
specifies how to deal with out of boundary values. The default value is Throw
.
ClimaUtilities.TimeVaryingInputs.Throw
— TypeThrow
Throw an error when interpolating outside of range.
ClimaUtilities.TimeVaryingInputs.PeriodicCalendar
— TypePeriodicCalendar
Repeat data periodically.
PeriodicCalendar
has two modes of operation:
First, when provided with a period
(described a DatePeriod
, e.g., Dates.Month(1)
or Dates.Year(1)
), assume that the provided data is repeated over that calendar period. A date
can be passed too, indicating what data to use. Only simple periods (e.g., Dates.Month(1)
) are supported. When provided a period, a repeat_date
is required too. This is the period of time that is repeated. For example, if period = Dates.Month(1)
and repeat_date = Dates.Date(1993, 11)
, November 1993 is repeated (if available in the input data).
Passing a period is not supported by all the interpolators (e.g., when reading from 1D files).
Second, if no period is provided, when interpolating outside of range, restart from the beginning.
For example, if the data is defined from t0 = 0 to t1 = 10, extrapolating at t=13 is equivalent to interpolating at t=2. In practice, we identify t1 + dt
to be t0
again. This is different from what you might be used to for periodic boundary conditions, where the identification is t1 = t0
.
This second mode of operation PeriodicCalendar
requires data to be uniformly sampled in time.
If the data is defined on a calendar year, this second mode of operation is equivalent to using the first mode with period = Dates.Year
(same with other periods).
ClimaUtilities.TimeVaryingInputs.Flat
— TypeFlat
When interpolating outside of range, use the boundary value.
For example, if the data is defined from t0 = 0 to t1 = 10, extrapolating at t=13 returns the value at t1 = 10. When interpolating at t=-3, use t0 = 0.
ClimaUtilities.TimeVaryingInputs.evaluate!
— Functionevaluate!(dest, input, time, args...; kwargs...)
Evaluate the input
at the given time
, writing the output in-place to dest
.
Depending on the details of input
, this function might do I/O and communication.
Extra arguments
args
and kwargs
are used only when the input
is a non-interpolating function, e.g., an analytic one. In that case, args
and kwargs
are passed down to the function itself.
ClimaUtilities.TimeVaryingInputs.extrapolation_bc
— Functionextrapolation_bc(aim::AbstractInterpolationMethod)
Return the interpolation boundary conditions associated to aim
.
Base.in
— Functionin(time, itp::InterpolatingTimeVaryingInput23D)
Check if the given time
is in the range of definition for itp
.
in(time, itp::InterpolatingTimeVaryingInput23D)
Check if the given time
is in the range of definition for itp
.
Base.close
— Methodclose(time_varying_input::TimeVaryingInputs.AbstractTimeVaryingInput)
Close files associated to the time_varying_input
.