OutputPathGenerator
The OutputPathGenerator
module provides tools for preparing the directory structure for your simulation output. This helps you organize your simulation results efficiently and avoid overwriting existing data.
The module offers one function, generate_output_path
. The function takes three arguments:
output_path
: The base directory path for your simulation output.style
(Optional): The desired style for output management (defaults toActiveLinkStyle
).context
(Optional): theClimaComms.context
. This is required in MPI runs to ensure that all the MPI processes agree on the folder structure.
The function processes the output_path
based on the chosen style and returns the final path where you should write your simulation output.
You should use generate_output_path
at the beginning of your simulation and use the return value as the base directory where you save all the output your code produces.
Available Styles
The module currently offers two different styles for handling the output directory:
RemovePreexistingStyle
(Destructive)
This style directly uses the provided output_path as the final output directory. Important: If a directory already exists at the specified path, it will be removed completely (including any subfolders and files) without confirmation. Use this style cautiously!
ActiveLinkStyle
(Non-Destructive)
This style provides a more convenient and non-destructive approach. It manages a sequence of subfolders within the base directory specified by outputpath. It also creates a symbolic link named `outputactive` that points to the current active subfolder. This allows you to easily access the latest simulation results.
Example
Let's assume your output_path
is set to data
.
- If
data
doesn't exist, the module creates it and returnsdata/output_0000
. In doing this, a linkdata/output_active
todata/output_0000
is created so that you can always access your data indata/output_active
. - If
data
exists and contains anoutput_active
link pointing todata/output_0005
,output_active
is updated to point to a new subfolder calleddata/output_0006
- If
data
exists with or without anoutput_active
link, the module checks for existing subfolders nameddata/output_XXXX
(withXXXX
a number). If none are found, it createsdata/output_0000
and a linkdata/output_active
pointing to it.
A note for Windows users
Windows does not always allow the creation of symbolic links by unprivileged users, so some details about links might be slightly different depending on your system. If you are using Windows, please have a look at docstring on the ActiveLinkStyle
to learn more about possible differences.
Restarting from previous outputs with ActiveLinkStyle
ClimaUtilities
provides a helper function, detect_restart_file
, that can help you automatically locate the most recent restart file generated by your simulation.
This function is useful if you want to restart a simulation from a previous checkpoint. It works with the ActiveLinkStyle
described above.
To find the most recent restart file, just call
restart_file = detect_restart_file(output_path; style)
where style has to be ActiveLinkStyle
object and output_path
is the base directory where your simulation output is stored (the same one you passed to generate_output_path
). Passing the style
is optional, if no style
is passed, ``
The function will scan the subfolders within outputpath (e.g., `outputpath/output0001,
outputpath/output_0002, etc.) and return the path to the most recent restart file it finds. If no restart file is found, it will return
nothing`.
You can customize the search for restart files by passing optional arguments to detect_restart_file
:
restart_file_rx
: A regular expression to match the names of your restart files. The default is r"day\d+\.\w+\.hdf5"
.
sort_func
: A function to sort the restart files and select the most recent one. The default is ClimaUtilities.sort_by_creation_time
, which sorts files based on their creation time. You can provide a custom function if you want to sort by a different criterion (e.g., the simulation time stored within the HDF5 file).
API
ClimaUtilities.OutputPathGenerator.generate_output_path
— Functiongenerate_output_path(output_path,
context = nothing,
style::OutputPathGeneratorStyle = ActiveLinkStyle())
Process the output_path
and return a string with the path where to write the output.
The context
is a ClimaComms
context and is required for MPI runs.
How the output should be structured (in terms of directory tree) is determined by the style
.
Styles
RemovePreexistingStyle
: theoutput_path
provided is the actual output path. If a directory already exists there, remove it without asking for confirmation.ActiveLinkStyle
: theoutput_path
returned is a new folder of the formoutput_path/output_1234
, where the number is incremented every time this function is called.ActiveLinkStyle
also creates a linkoutput_path/output_active
that ensures that the most recent output is always accessible at theoutput_path/output_active
path. This is style is non-destructive.
(Note, "styles" have nothing to do with Julia traits.)
generate_output_path(::RemovePreexistingStyle, output_path, context = nothing)
Documentation for this function is in the RemovePreexistingStyle
struct.
generate_output_path(::ActiveLinkStyle, output_path, context = nothing)
Documentation for this function is in the ActiveLinkStyle
struct.
ClimaUtilities.OutputPathGenerator.RemovePreexistingStyle
— TypeRemovePreexistingStyle
With this option, the output directory is directly specified. If the directory already exists, remove it. No confirmation is asked, so use at your own risk.
ClimaUtilities.OutputPathGenerator.ActiveLinkStyle
— TypeActiveLinkStyle
This style generates a unique output path within a base directory specified by output_path
. It ensures the base directory exists and creates it if necessary. Additionally, it manages a sequence of subfolders and a symbolic link named "output_active" for convenient access to the active output location.
This style is designed to:
- be non-destructive,
- provide a deterministic and fixed path for the latest available data,
- and have nearly zero runtime overhead.
generate_output_path
returns path to the newly created folder with the next available increment (of the form output_1234
), and ensures that a valid output_active
link points to that folder.
Examples:
Let us assume that output_path = dormouse
.
dormouse
does not exist in the current working directory:ActiveLinkStyle
will create it and returndormouse/output_0000
. In the process, a symlinkdormouse/output_active
is also created. This symlink points todormouse/output_0000
.dormouse
exists and contains aoutput_active
link that points todormouse/output_0005
,ActiveLinkStyle
will a create new directorydormouse/output_0006
, return this path, and change theoutput_active
to point to this directory.dormouse
exists and does not contain aoutput_active
,ActiveLinkStyle
will check if anydormouse/output_XXXX
exists. If not, it createsdormouse/output_0000
and a linkdormouse/output_active
that points to this directory.
A note for Windows users
Windows does not always allow the creation of symbolic links by unprivileged users. This depends on the version of Windows, but also some of its settings. When the creation of symbolic links is not possible, OutputPathGenerator
will create NTFS junction points instead. Junction points are similar to symbolic links, with the main difference that they have to refer to directories and they have to be absolute paths. As a result, on systems that do not allow unprivileged users to create symbolic links, moving the base output folder results in breaking the output_active
link.
ClimaUtilities.OutputPathGenerator.detect_restart_file
— Functiondetect_restart_file(base_output_dir;
restart_file_rx = r"day\d+\.\w+\.hdf5",
sort_func = sort_by_creation_time,
style = ActiveLinkStyle()
)
Detects and returns the path to the most recent restart file within the directory structure specified by base_output_dir
.
Returns nothing
if no suitable restart file is found.
This function searches for restart files within the directory structure organized according to the provided ActiveLinkStyle
. It identifies potential output directories based on the style and then looks for files matching the restart_file_rx
regular expression within these directories.
By default, the function assumes restart files have names like "dayDDDD.SSSSS.hdf5", where DDDD represents the day number and SSSSS represents the number of seconds.
If multiple restart files are found, the function uses the sort_func
to determine the most recent one. The default sorting function, sort_by_creation_time
, sorts files based on their creation timestamps, returning the file with the latest creation time. Users can provide custom sorting functions to prioritize files based on other criteria, such as the simulation time stored within the HDF5 file.
Return Value:
- If a suitable restart file is found, the function returns its full path as a string.
- If no output directory matching the
ActiveLinkStyle
or no restart file matching therestart_file_rx
is found, the function returnsnothing
. This indicates that automatic restart is not possible.
Arguments:
output_dir_style
: AnActiveLinkStyle
object defining the structure of output directories.base_output_dir
: The base directory where the output directory structure is located.restart_file_rx
: A regular expression used to identify restart files within output directories. Defaults tor"day\d+\.\w+\.hdf5"
.sort_func
: A function used to sort restart files and select the most recent one. Defaults tosort_by_creation_time
.