ClimaAnalysis

ClimaAnalysis, your one-stop-shop for all your CliMA analysis needs.

Quick start

SimDir

Suppose you ran a ClimaAtmos simulation, and the output is saved in the folder simulation_output. The first step in using ClimaAnalysis is to instantiate a SimDir:

import ClimaAnalysis

simdir = ClimaAnalysis.SimDir("simulation_output")

ClimaAnalysis.SimDir scans the simulation_output, finds all the output files, and organizes them. Consider the following directories:

global_diagnostics/
├── output_0001
│   ├── ta_3.0h_average.nc
│   ├── ta_3.0h_max.nc
│   ├── ta_3.0h_min.nc
│   ├── ta_4.0h_max.nc
│   ├── ts_1.0h_max.nc
│   ├── ua_6.0h_average.nc
│   └── va_2.0h_average.nc
├── output_0002
│   ├── ua_6.0h_average.nc
│   └── va_2.0h_average.nc
└── output_0003
    ├── ts_1.0h_max.nc
    ├── ua_6.0h_average.nc
    └── va_2.0h_average.nc

As of version 0.1.0, ClimaAnalysis uses file names to identify files and variables. In this, ClimaAnalysis assumes that the default names for outputs are used in ClimaAtmos (i.e., <short_name>_<reduction_time>_<reduction_type>.nc, as in ta_1h_max.nc, or <short_name>_1d_inst.nc).

Once you have a SimDir, you can inspect the output. For example, to find what variables are available:

julia> println(summary(simdir))
Output directory: simulation_output
Variables:
- va
    average (2.0h)
- ua
    average (6.0h)
- orog
    inst (1.0d)
- ta
    average (3.0h)
    max (4.0h, 3.0h)
    min (3.0h)
- ts
    max (1.0h)

Now, you can access any given variable

ts_max = get(simdir; short_name = "ts", reduction = "max", period = "3.0h")

ts_max is a OutputVar, a type that contains the variable as well as some metadata. When there is only one combination short_name/reduction/period, the function get can be used with get(simdir, short_name) (e.g., get(simdir, "orog") in the previous example).

If there are more files with the same combination of short name, reduction, and period, then the function get automatically stitch the .nc files together along the time dimension.

# Stitch `ua_6.0h_average.nc` in output_0001, output_0002, and output_0003
ua_average = get(simdir; short_name = "ua", reduction = "average", period = "6.0h")

# Stitch `ts_1.0h_max.nc` in output_0001 and output_0003
ts_average = get(simdir; short_name = "ts", reduction = "max", period = "1.0h")

The order of files in the directory tree traversed top-down determines the order of the files when stitching them. Stitching the datasets is not possible when the start dates are not the same, the names of the time dimension are not the same across the datasets, the time dimension does not exist, or the times are not in sequential order.

Let us learn about OutputVars

OutputVar

OutputVars contain the raw data (in .data), the attributes read from the file, and the information regarding the dimension over which the variable is defined.

julia> ts_max.dims
OrderedCollections.OrderedDict{String, Vector{Float32}} with 4 entries:
  "time" => [10800.0, 21600.0, 32400.0, 43200.0]
  "lon"  => [-180.0, -177.989, -175.978, -173.966, -171.955, -169.944, -167.933, -165.922…
  "lat"  => [-80.0, -77.9747, -75.9494, -73.924, -71.8987, -69.8734, -67.8481, -65.8228, …
  "z"    => [0.0, 5000.0, 10000.0, 15000.0, 20000.0, 25000.0, 30000.0, 35000.0, 40000.0, …

Here we have the dimensions and their values. The dimensions are ordered as in the file, so that the first index of .data is time, and so on.

We can find the attributes of the dimensions in .attributes:

julia> ts_max.dim_attributes["lon"]
  "lon"  => Dict("units"=>"degrees_east")

Some of the attributes are exposed with function calls. For example

julia> long_name(ts_max)
  Surface Temperature, max within 1.0 Hour(s)

These function use the attributes in the NetCDF files. When not available, empty strings are returned.

Given an OutputVar, we can perform manipulations. For instance, we can take the average over latitudes:

ts_max_lat_averaged = ClimaAnalysis.average_lat(ts_max)

Now,

ts_max_lat_averaged.dims =
OrderedCollections.OrderedDict{String, Vector{Float32}} with 3 entries:
  "time" => [10800.0, 21600.0, 32400.0, 43200.0]
  "lon"  => [-180.0, -177.989, -175.978, -173.966, -171.955, -169.944, -167.933, -165.922…
  "z"    => [0.0, 5000.0, 10000.0, 15000.0, 20000.0, 25000.0, 30000.0, 35000.0, 40000.0, …

We can also take a time/altitude slice, for example, the plane with altitude closest to 8000 meters.

ts_max_lat_averaged_sliced = ClimaAnalysis.slice(ts_max_lat_averaged, 8_000)

Alternatively, you can also call ClimaAnalysis.slice(ts_max_lat_averaged_sliced, time = 8_000). Now,

ts_max_lat_averaged_sliced.dims =
OrderedCollections.OrderedDict{String, Vector{Float32}} with 2 entries:
  "time" => [10800.0, 21600.0, 32400.0, 43200.0]
  "lon"  => [-180.0, -177.989, -175.978, -173.966, -171.955, -169.944, -167.933, -165.922…

You can get the dimensions from standard names, for example, to find the times, simply run

times(ts_max_lat_averaged_sliced) =
4-element Vector{Float32}:
 10800.0
 21600.0
 32400.0
 43200.0

OutputVars can be evaluated on arbitrary points. For instance

julia> ts_max([12000., 23., 45., 1200.])

will return the value of the maximum temperature at time 12000, longitude 23, latitude 45, and altitude 1200. This can be used to interpolate OutputVars onto new grids.

Mathematical operations

OutputVars support the usual mathematical operations. For instance, if ts_max is an OutputVar, 2 * ts_max will be an OutputVar with doubled values.

For binary operations (e.g., +, -, *, /), ClimaAnalysis will check if the operation is well defined (i.e., the two variables are defined on the physical space). Binary operations do remove some attribute information.

Visualize

We can directly visualize OutputVars.

If Makie is available, ClimaAnalysis can be used for plotting. Importing Makie and ClimaAnalysis in the same session automatically loads the necessary ClimaAnalysis plotting modules.

If we want to make a heatmap for ts_max at time of 100 s at altitude z of 30000 meters:

import CairoMakie
import ClimaAnalysis.Visualize as viz

fig = CairoMakie.Figure(size = (400, 600))

viz.plot!(
  fig,
  ts_max,
  time = 100.0,
  z = 30_000.0
)

CairoMakie.save("ts_max.png", fig)

If we want to have a line plot, we can simply add another argument (e.g., lat = 30), to slice through that value.

If you want to customize some of the properties, you can pass the more_kwargs to the plot! function. more_kwargs is a dictionary that can contain additional arguments to the Axis (:axis), plot (:plot), and Colorbar (:cb) functions. more_kwargs is a Dictionary that maps the symbols :axis, :plot, and :cb to their additional arguments. For instance, to choose the alpha value of the plot, the label of the colorbar, and the subtitle, you can do the following:

viz.plot!(
    fig,
    ts_max,
    time = 100.0,
    z = 30_000.0,
    more_kwargs = Dict(
        :plot => Dict(:alpha => 0.5),
        :cb => Dict(:label => "My label"),
        :axis => Dict(:subtitle => "My subtitle"),
    ),
)

Note the Symbol in plot, cb, and axis!. :plot, :cb, and :axis have to be a mapping of Symbols and values. ClimaAnalysis has a convenience function kwargs to more easily pass down the keyword arguments avoiding this step. With that, the above example becomes

import ClimaAnalysis.Utils : kwargs as ca_kwargs
viz.plot!(
  fig,
  ts_max,
  time = 100.0,
  z = 30_000.0,
  more_kwargs = Dict(
      :plot => ca_kwargs(alpha = 0.5),
      :cb => ca_kwargs(label = "My label"),
      :axis => ca_kwargs(subtitle = "My subtitle"),
  ),
)

With Utils.kwargs, you can just pass the arguments as you would pass them to the constructor.

If you need more control over the placement of plots, you can pass Makie.GridLayout objects to the plotting functions. For example,

using CairoMakie

fig = Figure()
layout = fig[1, 2] = GridLayout()

viz.plot!(
  layout,
  ts_max,
  time = 100.0,
  z = 30_000.0,
  more_kwargs = Dict(
      :plot => ca_kwargs(alpha = 0.5),
      :cb => ca_kwargs(label = "My label"),
      :axis => ca_kwargs(subtitle = "My subtitle"),
  ),
)

When you pass a GridLayout, the optional argument p_loc refers to the placement within the layout. When you pass a Figure, it refers to the placement within the figure.

If you have GeoMakie and are working on a variable defined on a long-lat grid, you can directly plot on a projected global surface. For that, load GeoMakie and use the heatmap2D_on_globe! function.