OutputVars

OutputVars are the heart of ClimaAnalysis. This page is under construction, in the meantime, consult OutputVar.

OutputVars can be directly generated from most NetCDF files. Just pass the path to the constructor:

import ClimaAnalysis: OutputVar

myfile = OutputVar("my_netcdf_file.nc")

ClimaAnalysis will try to find a variable in this file. If multiple are available, ClimaAnalysis picks the latest in alphabetical order. If you want to specify one, pass it to the constructor:

import ClimaAnalysis: OutputVar

myfile = OutputVar("my_netcdf_file.nc", "myvar")

Physical units

OutputVars can contain information about their physical units. For OutputVars read from NetCDF files, this is obtained from the units attribute (and stored in the attributes["units"]).

When possible, ClimaAnalysis uses Unitful to handle units. This enables automatic unit conversion for OutputVars.

Consider the following example:

import ClimaAnalysis
values = 0:100.0 |> collect
data = copy(values)
attribs = Dict("long_name" => "speed", "units" => "m/s")
dim_attribs = Dict{String, Any}()
var = ClimaAnalysis.OutputVar(attribs, Dict("distance" => values), dim_attribs, data)

var_cms = ClimaAnalysis.convert_units(var, "cm/s")

In this example, we set upvar, an OutputVar with units of meters per second. Then, we called ClimaAnalysis.convert_units to convert the units to centimeters per second.

Sometimes, this automatic unit conversion is not possible (e.g., when you want to transform between incompatible units). In this case, you can pass a function that specify how to apply this transformation. For example, in the previous case, we can assume that we are talking about water and transform units into a mass flux:

new_var = ClimaAnalysis.convert_units(var, "kg m/s", conversion_function = (x) -> 1000x)

!!! note If you find some unparseable units, please open an issue. We can fix them!

If units do not exist, or you want to change the name of the units, then one can use the set_units function.

new_var = ClimaAnalysis.set_units(var, "kg m s^-1")

For converting the units of a dimension, you can use ClimaAnalysis.convert_dim_units. As of now, automatic conversion is not supported which means you need to supply the conversion function. See the example below.

new_var = ClimaAnalysis.convert_dim_units(
        var,
        "lat",
        "rads",
        conversion_function = x -> x * π / 180.0,
    )

Similarly, to set the units of a dimension, you can use the dim_set_units! function.

new_var = ClimaAnalysis.set_dim_units!(var, "lon", "degrees_east")
Override existing units

If units already exist, this will override the units for data or the dimension in var.

Interpolations and extrapolations

Interpolating a OutputVar onto coordinates can be done by doing the following:

var((0.0, 0.0)) # var is a two-dimensional OutputVar

A multilinear interpolation is used to determine the value at the coordinate (0, 0).

Interpolate on dates

If any of the dimensions contains Dates.DateTime elements, interpolation is not possible. Interpolations.jl does not support interpolating on dates.

Extrapolating is supported only on the longitude and latitude dimensions. For the longitude and latitude dimensions, a periodic boundary condition and a flat boundary condition are added, respectively, when the dimension array is equispaced and spans the entire range. For all other cases, extrapolating beyond the domain of the dimension will throw an error.

Preprocess dates and seconds

When loading a NetCDF file, dates in the time dimension are automatically converted to seconds and a start date is added to the attributes of the OutputVar. This is done because ClimaAnalysis does not support interpolating on dates which mean functions that rely on the interpolats, such as resampled_as, will not work otherwise.

Two additional parameters are provided to help preprocess dates which are new_start_date and shift_by. If new_start_date is provided, then dates in the time dimension will automatically be converted with reference to the new_start_date rather than the first date found in the NetCDF file. The parameter new_start_date can be any string parseable by the Dates module or a Dates.DateTime object. If additional preprocessing is needed, then one can provide a function that takes in and returns a Date.DateTime object. This function is applied to each date before converting each dates to seconds with reference with the start date.

# Shift the dates to first day of month, convert to seconds, and adjust seconds to
# match the date 1/1/2010
obs_var = ClimaAnalysis.OutputVar(
                "pr.nc",
                "precip",
                new_start_date = "2010-01-01T00:00:00", # or Dates.DateTime(2010, 1, 1)
                shift_by = Dates.firstdayofmonth,
            )

Additionally, the function shift_to_start_of_previous_month(var::OutputVar) is provided to help with preprocessing. This function shifts the times in the time dimension to the start of the previous month. After applying this function, the start date in the attributes corresponds to the first element in the time array.

sim_var = shift_to_start_of_previous_month(sim_var)

This function is helpful in ensuring consistency in dates between simulation and observational data. One example of this is when adjusting monthly averaged data. For instance, suppose that data on 2010-02-01 in sim_var corresponds to the monthly average for January. This function shifts the times so that 2010-01-01 will correspond to the monthly average for January.

Integration

OutputVars can be integrated with respect to longitude, latitude, or both using integrate_lon(var), integrate_lat(var), or integrate_lonlat(var) respectively. The bounds of integration are determined by the range of the dimensions longitude and latitude in var. The unit of both longitude and latitude should be degree.

If the points are equispaced, it is assumed that each point correspond to the midpoint of a cell which results in rectangular integration using the midpoint rule. Otherwise, the integration being done is rectangular integration using the left endpoints for integrating longitude and latitude.

See the example of integrating over a sphere where the data is all ones to find the surface area of a sphere.

julia> lon = collect(range(-179.5, 179.5, 360));

julia> lat = collect(range(-89.5, 89.5, 180));

julia> data = ones(length(lon), length(lat));

julia> dims = OrderedDict(["lon" => lon, "lat" => lat]);

julia> dim_attribs = OrderedDict([
           "lon" => Dict("units" => "degrees_east"),
           "lat" => Dict("units" => "degrees_north"),
       ]);

julia> attribs = Dict("long_name" => "f");

julia> var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data);

julia> integrated_var = integrate_lonlat(var);

julia> integrated_var.dims # no dimensions since longitude and latitude are integrated out
OrderedDict{String, Vector{Float64}}()

julia> integrated_var.data # approximately 4π (the surface area of a sphere)
0-dimensional Array{Float64, 0}:
12.566530113084296

julia> long_name(integrated_var) # updated long name to reflect the data being integrated
"f integrated over lon (-179.5 to 179.5degrees_east) and integrated over lat (-89.5 to 89.5degrees_north)"

Split by season

OutputVars can be split by seasons using split_by_season(var) provided that a start date can be found in var.attributes["start_date"] and time is a dimension in the OutputVar. The unit of time is expected to be second. The function split_by_season(var) returns a vector of four OutputVars with each OutputVar corresponding to a season. The months of the seasons are March to May, June to August, September to November, and December to February. The order of the vector is MAM, JJA, SON, and DJF. If there are no dates found for a season, then the OutputVar for that season will be an empty OutputVar.

julia> attribs = Dict("start_date" => "2024-1-1");

julia> time = [0.0, 5_184_000.0, 13_132_800.0]; # correspond to dates 2024-1-1, 2024-3-1, 2024-6-1

julia> dims = OrderedDict(["time" => time]);

julia> dim_attribs = OrderedDict(["time" => Dict("units" => "s")]); # unit is second

julia> data = [1.0, 2.0, 3.0];

julia> var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data);

julia> MAM, JJA, SON, DJF = ClimaAnalysis.split_by_season(var);

julia> ClimaAnalysis.isempty(SON) # empty OutputVar because no dates between September to November
true

julia> [MAM.dims["time"], JJA.dims["time"], DJF.dims["time"]]
3-element Vector{Vector{Float64}}:
 [5.184e6]
 [1.31328e7]
 [0.0]

julia> [MAM.data, JJA.data, DJF.data]
3-element Vector{Vector{Float64}}:
 [2.0]
 [3.0]
 [1.0]

Bias and squared error

Bias and squared error can be computed from simulation data and observational data in OutputVars using bias(sim, obs) and squared_error(sim, obs). The function bias(sim, obs) returns a OutputVar whose data is the bias (sim.data - obs.data) and computes the global bias of data in sim and obs over longitude and latitude. The result is stored in var.attributes["global_bias"]. The function squared_error(sim, obs) returns a OutputVar whose data is the squared error ((sim.data - obs.data)^2) and computes the global mean squared error (MSE) and the global root mean squared error (RMSE) of data in sim and obs over longitude and latitude. The result is stored in var.attributes["global_mse"] and var.attributes["global_rmse"]. Resampling is automatically done by resampling obs on sim. If you are only interested in computing global bias, MSE, or RMSE, you can use global_bias(sim, obs), global_mse(sim, obs), or global_rmse(sim, obs).

As of now, these functions are implemented for OutputVars with only the dimensions longitude and latitude. Furthermore, units must be supplied for data and dimensions in sim and obs and the units for longitude and latitude should be degrees.

Consider the following example, where we compute the bias and RMSE between our simulation and some observations stored in ta_1d_average.nc.

julia> obs_var = OutputVar("ta_1d_average.nc"); # load in observational data

julia> sim_var = get(simdir("simulation_output"), "ta"); # load in simulation data

julia> ClimaAnalysis.short_name(sim_var)
"ta"

julia> bias_var = ClimaAnalysis.bias(sim_var, obs_var); # bias_var is a OutputVar that can be plotted

julia> ClimaAnalysis.global_bias(sim, obs)
2.0

julia> ClimaAnalysis.units(bias_var)
"K"

julia> se_var = ClimaAnalysis.squared_error(sim_var, obs_var); # can also be plotted

julia> ClimaAnalysis.global_mse(sim, obs)
4.0

julia> ClimaAnalysis.global_rmse(sim, obs)
2.0

julia> ClimaAnalysis.units(se_var)
"K^2"

3D OutputVars

For three-dimensional variables that have dimensions longitude, latitude, and z, pressure, or time, the functions mentioned before will not work. To compute the bias or squared error, one can use ClimaAnalysis.slice to slice across the z, pressure, or time dimension to get a 2D variable defined on longitude and latitude. Then, any of the functions mentioned earlier will work. See an example of this below, where the bias and global MSE are computed between two OutputVars, where the time dimension is sliced at one day from the start date.

# Load in 3D temperature variable defined over longitude, latitude, and time
julia> obs_var = OutputVar("ta_1d_average.nc"); # load in observational data

# Load in 3D temperature variable defined over longitude, latitude, and time
julia> sim_var = get(simdir("simulation_output"), "ta"); # load in simulation data

# Slice to get variables defined over longitude and latitude
julia> obs_var = ClimaAnalysis.slice(obs_var, time =  86400)

julia> sim_var = ClimaAnalysis.slice(sim_var, time =  86400)

julia> ClimaAnalysis.bias(sim_var, obs_var);

julia> ClimaAnalysis.global_mse(sim, obs)
4.0

For 3D variables defined over longitude, latitude, and pressure, one can find the global RMSE in pressure space using ClimaAnalysis.global_rmse_pfull. See an example of this below, where global RMSE is computed between 3D variables in pressure space.

# Load in 3D temperature variable defined over longitude, latitude, and pressure
julia> obs_var = OutputVar("era5_pfull_ta_data.nc"); # load in observational data

# Load in 3D temperature variable defined over longitude, latitude, and z
julia> sim_var = get(simdir("simulation_output"), "ta"); # load in simulation data

# Load in 3D pressure variable defined over longitude, latitude, and z
julia> pressure_3D_var = get(simdir("simulation_output"), "pfull"); # load in simulation data

# This function will automatically converts to pressure coordinates for `sim_var` and
# `obs_var` if the keywords `sim_pressure` and `obs_pressure` are supplied respectively
julia> ClimaAnalysis.global_rmse_pfull(sim_var, obs_var, sim_pressure = pressure_3D_var)
3.4

Masking

Bias and squared error can be computed only over the land or ocean through the mask parameter. As of now, the mask parameter takes in apply_oceanmask or apply_oceanmask. See the example below of this usage.

# Do not consider the ocean when computing the bias
ClimaAnalysis.bias(sim_var, obs_var, mask = apply_oceanmask)
ClimaAnalysis.global_bias(sim_var, obs_var, mask = apply_oceanmask)

# Do not consider the land when computing the squared error
ClimaAnalysis.squared_error(sim_var, obs_var, mask = apply_landmask)
ClimaAnalysis.global_mse(sim_var, obs_var, mask = apply_landmask)
ClimaAnalysis.global_rmse(sim_var, obs_var, mask = apply_landmask)

In other cases, you may want to generate a masking function using a OutputVar. For instance, you are comparing against observational data over land and you can't use an ocean mask since not all of the observational data is defined over the land. The function make_lonlat_mask allows you to generate a masking function. If the data is already zeros and ones, then you can use make_lonlat_mask(var). Otherwise, you can specify set_to_val which takes in an element of var.data and return a boolean. If set_to_val returns true, then the value will be true_val in the mask and if set_to_val returns false, then the value will be false_val in the mask. See the example below of this usage.

# Any points that are NaNs should be zero in the mask
mask_fn = ClimaAnalysis.make_lonlat_mask(
    var;
    set_to_val = isnan,
    true_val = 0.0, # default is NaN
    false_val = 1.0,
)

# Apply mask to another OutputVar
another_masked_var = mask_fn(another_var)

# Compute squared error and global MSE with custom masking function
ClimaAnalysis.squared_error(sim_var, obs_var, mask = mask_fn)
ClimaAnalysis.global_mse(sim_var, obs_var, mask = mask_fn)