OutputPathGenerator

The OutputPathGenerator module provides tools for preparing the directory structure for your simulation output. This helps you organize your simulation results efficiently and avoid overwriting existing data.

The module offers one function, generate_output_path. The function takes three arguments:

  • output_path: The base directory path for your simulation output.
  • style (Optional): The desired style for output management (defaults to ActiveLinkStyle).
  • context (Optional): the ClimaComms.context. This is required in MPI runs to ensure that all the MPI processes agree on the folder structure.

The function processes the output_path based on the chosen style and returns the final path where you should write your simulation output.

You should use generate_output_path at the beginning of your simulation and use the return value as the base directory where you save all the output your code produces.

Available Styles

The module currently offers two different styles for handling the output directory:

RemovePreexistingStyle (Destructive)

This style directly uses the provided output_path as the final output directory. Important: If a directory already exists at the specified path, it will be removed completely (including any subfolders and files) without confirmation. Use this style cautiously!

ActiveLinkStyle (Non-Destructive)

This style provides a more convenient and non-destructive approach. It manages a sequence of subfolders within the base directory specified by outputpath. It also creates a symbolic link named `outputactive` that points to the current active subfolder. This allows you to easily access the latest simulation results.

Example

Let's assume your output_path is set to data.

  • If data doesn't exist, the module creates it and returns data/output_0000. In doing this, a link data/output_active to data/output_0000 is created so that you can always access your data in data/output_active.
  • If data exists and contains an output_active link pointing to data/output_0005, output_active is updated to point to a new subfolder called data/output_0006
  • If data exists with or without an output_active link, the module checks for existing subfolders named data/output_XXXX (with XXXX a number). If none are found, it creates data/output_0000 and a link data/output_active pointing to it.

A note for Windows users

Windows does not always allow the creation of symbolic links by unprivileged users, so some details about links might be slightly different depending on your system. If you are using Windows, please have a look at docstring on the ActiveLinkStyle to learn more about possible differences.

Restarting from previous outputs with ActiveLinkStyle

ClimaUtilities provides a helper function, detect_restart_file, that can help you automatically locate the most recent restart file generated by your simulation.

This function is useful if you want to restart a simulation from a previous checkpoint. It works with the ActiveLinkStyle described above.

To find the most recent restart file, just call

restart_file = detect_restart_file(output_path; style)

where style has to be ActiveLinkStyle object and output_path is the base directory where your simulation output is stored (the same one you passed to generate_output_path). Passing the style is optional, if no style is passed, ``

The function will scan the subfolders within outputpath (e.g., `outputpath/output0001,outputpath/output_0002, etc.) and return the path to the most recent restart file it finds. If no restart file is found, it will returnnothing`.

You can customize the search for restart files by passing optional arguments to detect_restart_file:

restart_file_rx: A regular expression to match the names of your restart files. The default is r"day\d+\.\w+\.hdf5".

sort_func: A function to sort the restart files and select the most recent one. The default is ClimaUtilities.sort_by_creation_time, which sorts files based on their creation time. You can provide a custom function if you want to sort by a different criterion (e.g., the simulation time stored within the HDF5 file).

API

ClimaUtilities.OutputPathGenerator.generate_output_pathFunction
generate_output_path(output_path,
                     context = nothing,
                     style::OutputPathGeneratorStyle = ActiveLinkStyle())

Process the output_path and return a string with the path where to write the output.

The context is a ClimaComms context and is required for MPI runs.

How the output should be structured (in terms of directory tree) is determined by the style.

Styles

  • RemovePreexistingStyle: the output_path provided is the actual output path. If a directory already exists there, remove it without asking for confirmation.

  • ActiveLinkStyle: the output_path returned is a new folder of the form output_path/output_1234, where the number is incremented every time this function is called. ActiveLinkStyle also creates a link output_path/output_active that ensures that the most recent output is always accessible at the output_path/output_active path. This is style is non-destructive.

(Note, "styles" have nothing to do with Julia traits.)

source
generate_output_path(::RemovePreexistingStyle, output_path, context = nothing)

Documentation for this function is in the RemovePreexistingStyle struct.

source
generate_output_path(::ActiveLinkStyle, output_path, context = nothing)

Documentation for this function is in the ActiveLinkStyle struct.

source
ClimaUtilities.OutputPathGenerator.ActiveLinkStyleType
ActiveLinkStyle

This style generates a unique output path within a base directory specified by output_path. It ensures the base directory exists and creates it if necessary. Additionally, it manages a sequence of subfolders and a symbolic link named "output_active" for convenient access to the active output location.

This style is designed to:

  • be non-destructive,
  • provide a deterministic and fixed path for the latest available data,
  • and have nearly zero runtime overhead.

generate_output_path returns path to the newly created folder with the next available increment (of the form output_1234), and ensures that a valid output_active link points to that folder.

Examples:

Let us assume that output_path = dormouse.

  • dormouse does not exist in the current working directory: ActiveLinkStyle will create it and return dormouse/output_0000. In the process, a symlink dormouse/output_active is also created. This symlink points to dormouse/output_0000.
  • dormouse exists and contains a output_active link that points to dormouse/output_0005, ActiveLinkStyle will a create new directory dormouse/output_0006, return this path, and change the output_active to point to this directory.
  • dormouse exists and does not contain a output_active, ActiveLinkStyle will check if any dormouse/output_XXXX exists. If not, it creates dormouse/output_0000 and a link dormouse/output_active that points to this directory.

A note for Windows users

Windows does not always allow the creation of symbolic links by unprivileged users. This depends on the version of Windows, but also some of its settings. When the creation of symbolic links is not possible, OutputPathGenerator will create NTFS junction points instead. Junction points are similar to symbolic links, with the main difference that they have to refer to directories and they have to be absolute paths. As a result, on systems that do not allow unprivileged users to create symbolic links, moving the base output folder results in breaking the output_active link.

source
ClimaUtilities.OutputPathGenerator.detect_restart_fileFunction
detect_restart_file(base_output_dir;
                    restart_file_rx = r"day\d+\.\w+\.hdf5",
                    sort_func = sort_by_creation_time,
                    style = ActiveLinkStyle()
                    )

Detects and returns the path to the most recent restart file within the directory structure specified by base_output_dir.

Returns nothing if no suitable restart file is found.

This function searches for restart files within the directory structure organized according to the provided ActiveLinkStyle. It identifies potential output directories based on the style and then looks for files matching the restart_file_rx regular expression within these directories.

By default, the function assumes restart files have names like "dayDDDD.SSSSS.hdf5", where DDDD represents the day number and SSSSS represents the number of seconds.

If multiple restart files are found, the function uses the sort_func to determine the most recent one. The default sorting function, sort_by_creation_time, sorts files based on their creation timestamps, returning the file with the latest creation time. Users can provide custom sorting functions to prioritize files based on other criteria, such as the simulation time stored within the HDF5 file.

Return Value:

  • If a suitable restart file is found, the function returns its full path as a string.
  • If no output directory matching the ActiveLinkStyle or no restart file matching the restart_file_rx is found, the function returns nothing. This indicates that automatic restart is not possible.

Arguments:

  • output_dir_style: An ActiveLinkStyle object defining the structure of output directories.
  • base_output_dir: The base directory where the output directory structure is located.
  • restart_file_rx: A regular expression used to identify restart files within output directories. Defaults to r"day\d+\.\w+\.hdf5".
  • sort_func: A function used to sort restart files and select the most recent one. Defaults to sort_by_creation_time.
source