RMSEVariables

RMSEVariables contain all the information needed to process and compare root mean squared errors (RMSEs) between different models and categories (e.g., seasons) for a single variable of interest.

ClimaAnalysis provides several constructors for making a RMSEVariable. For all constructors, a short name and a vector of model names must be provided. If units are not provided, then each model will have no unit which denotes the missing unit. See the examples below where the constructor can take in a short name, a vector of model names, a vector of categories, and a dictionary mapping model names to units or a string of the name of the unit.

import ClimaAnalysis

rmse_var = ClimaAnalysis.RMSEVariable("ta", ["ACCESS-CM2", "ACCESS-ESM1-5"])
rmse_var = ClimaAnalysis.RMSEVariable(
    "ta",
    ["ACCESS-CM2", "ACCESS-ESM1-5"],
    Dict("ACCESS-CM2" => "K", "ACCESS-ESM1-5" => "K"),
)
rmse_var = ClimaAnalysis.RMSEVariable(
    "ta",
    ["ACCESS-CM2", "ACCESS-ESM1-5"],
    ["DJF", "MAM", "JJA", "SON", "ANN"],
    Dict("ACCESS-CM2" => "K", "ACCESS-ESM1-5" => "K"),
)
rmse_var = ClimaAnalysis.RMSEVariable(
    "ta",
    ["ACCESS-CM2", "ACCESS-ESM1-5"],
    ["DJF", "MAM", "JJA", "SON", "ANN"],
    ones(2, 5),
    Dict("ACCESS-CM2" => "K", "ACCESS-ESM1-5" => "K"),
)
# Convenience functions if models all share the same unit
rmse_var = ClimaAnalysis.RMSEVariable(
    "ta",
    ["ACCESS-CM2", "ACCESS-ESM1-5"],
    "K",
)
rmse_var = ClimaAnalysis.RMSEVariable(
    "ta",
    ["ACCESS-CM2", "ACCESS-ESM1-5"],
    ["DJF", "MAM", "JJA", "SON", "ANN"],
    "K",
)
rmse_var = ClimaAnalysis.RMSEVariable(
    "ta",
    ["ACCESS-CM2", "ACCESS-ESM1-5"],
    ["DJF", "MAM", "JJA", "SON", "ANN"],
    ones(2, 5),
    "K",
)

The RMSEVariable can be inspected using model_names, category_names, and rmse_units which provide the model names, the category names, and the units respectively.

julia> ClimaAnalysis.model_names(rmse_var)2-element Vector{String}:
 "ACCESS-CM2"
 "ACCESS-ESM1-5"
julia> ClimaAnalysis.category_names(rmse_var)5-element Vector{String}: "DJF" "MAM" "JJA" "SON" "ANN"
julia> ClimaAnalysis.rmse_units(rmse_var)Dict{String, String} with 2 entries: "ACCESS-ESM1-5" => "K" "ACCESS-CM2" => "K"

Reading RMSEs from CSV file

Typically, the root mean squared errors (RMSEs) of different models across different categories are stored in a different file and need to be loaded in. ClimaAnalysis can load this information from a CSV file and store it in a RMSEVariable. The format of the CSV file should have a header consisting of the entry "model_name" (or any other text as it is ignored by the function) and rest of the entries should be the category names. Each row after the header should start with the model name and the root mean squared errors for each category for that model. The entries of the CSV file should be separated by commas.

See the example below using read_rmses where data is loaded from test_csv.csv and a short name of ta is provided. One can also pass in a dictionary mapping model names to units for units or a string if the units are the same for all the models.

rmse_var = ClimaAnalysis.read_rmses("./data/test_csv.csv", "ta")
rmse_var = ClimaAnalysis.read_rmses(
    "./data/test_csv.csv",
    "ta",
    units = Dict("ACCESS-CM2" => "K", "ACCESS-ESM1-5" => "K"), # passing units as a dictionary
)
rmse_var = ClimaAnalysis.read_rmses(
    "./data/test_csv.csv",
    "ta",
    units = "K", # passing units as a string
)

Indexing

After loading the data, one may want to inspect, change, or manipulate the data. This is possible by the indexing functionality that RMSEVariable provides. Indexing into a RMSEVariable is similar, but not the same as indexing into an array. Indexing by integer or string is supported, but linear indexing (e.g. rmse_var[1]) is not supported. integer or string is supported, but linear indexing (e.g., rmse_var[1]) is not supported.

julia> rmse_var[:, :]2×5 Matrix{Float64}:
 11.941  10.178  13.279  10.443    8.71
 15.752  12.477  15.955  12.972  NaN
julia> rmse_var["ACCESS-CM2"]5-element Vector{Float64}: 11.941 10.178 13.279 10.443 8.71
julia> rmse_var[:, "MAM"]2-element Vector{Float64}: 10.178 12.477
julia> rmse_var["ACCESS-CM2", ["ANN", "DJF", "MAM"]]3-element Vector{Float64}: 8.71 11.941 10.178
julia> rmse_var[2,5] = 11.2;
julia> rmse_var[:, :]2×5 Matrix{Float64}: 11.941 10.178 13.279 10.443 8.71 15.752 12.477 15.955 12.972 11.2

Adding categories, models, and units

It may be the case that the CSV file does not contain all the models you want to analyze, or you want to consider another category but do not want to go in and manually edit the CSV file to add it. ClimaAnalysis provides add_category, add_model, and add_unit! for adding categories, models, and units respectively. Multiple model or categories can be provided (e.g., add_model(rmse_var, "model1", "model2")) in the functions. For adding multiple units, one can pass in a dictionary mapping model names to units. See the example below using this functionality.

rmse_var2 = ClimaAnalysis.add_category(rmse_var, "Jan") # can take in more than one category
rmse_var = ClimaAnalysis.add_model(rmse_var, "CliMA") # can take in more than one model name
ClimaAnalysis.add_unit!(rmse_var, "CliMA", "K")
ClimaAnalysis.add_unit!(rmse_var, Dict("CliMA" => "K")) # for adding multiple units

Summary statistics

ClimaAnalysis provides several functions to compute summary statistics. As of now, ClimaAnalysis provides methods for find the best single model, the worst single model, and the median model.

The functions find_best_single_model and find_worst_single_model default to the category "ANN" (corresponding to the annual mean), but any category can be considered using the parameter category_name. Furthermore, the model's root mean squared errors (RMSEs) and the model's name are returned. The function median only returns the median model's RMSEs.

Any NaN that appears in the data is ignored when computing the summary statistics.

See the example below using this functionality.

julia> ClimaAnalysis.find_best_single_model(rmse_var, category_name = "DJF")([11.941, 10.178, 13.279, 10.443, 8.71], "ACCESS-CM2")
julia> ClimaAnalysis.find_worst_single_model(rmse_var, category_name = "DJF")([15.752, 12.477, 15.955, 12.972, 11.2], "ACCESS-ESM1-5")
julia> ClimaAnalysis.median(rmse_var)5-element Vector{Float64}: 13.8465 11.3275 14.617 11.7075 9.955