ablator.analysis package#

Submodules#

ablator.analysis.main module#

class ablator.analysis.main.Analysis(results: DataFrame | Results, categorical_attributes: list[str] | None = None, numerical_attributes: list[str] | None = None, optim_metrics: dict[str, ablator.config.proto.Optim] | None = None, save_dir: str | None = None, cache: bool = False)[source]#

Bases: object

A class that stores and processes the attributes, metrics, and other data for the plotting of the experiment result.

Parameters:

resultspd.DataFrame | Results: The result dataframe.
categorical_attributeslist[str] | None: The list of all the categorical hyperparameter names, by default None.
numerical_attributeslist[str] | None: The list of all the numerical hyperparameter names, by default None.
optim_metricsdict[str, Optim] | None: A dictionary mapping metric names to optimization directions, by default None.
save_dirstr | None: The directory to save analysis results to, by default None.
cachebool: Whether to cache results, by default False.

Raises:

FileNotFoundError: if the provided save_dir to save plots don’t exists.
ValueError: if cache is True but no save_dir is provided.

Attributes:

optim_metricsdict[str, Optim]: A dictionary mapping metric names to optimization directions.
save_dirPath | None: The directory to save analysis results to.
cacheMemory | None: A joblib memory cache for saving results.
categorical_attributeslist[str]: The list of all the categorical hyperparameter names
numerical_attributeslist[str]: The list of all the numerical hyperparameter names
experiment_attributeslist[str]: The list of all the hyperparameter names
resultspd.DataFrame: The dataframe extracted from the results file based on given metrics names and hyperparameter names.

property metric_names: list[str]#

Returns:

list[str]: list of all the metrics that will be plotted w.r.t hyperparameters.

Examples

>>> Make PlotAnalysis's object
plots = Analysis(
    ...
    optim_metrics={"val_loss": Optim.min, "train_loss": Optim.min},
)
metrics = plots.metric_names
>>> returns
['val_loss', 'train_loss']

ablator.analysis.results module#

class ablator.analysis.results.Results(config: type[ablator.config.mp.ParallelConfig] | ParallelConfig, experiment_dir: str | Path, cache: bool = False, use_ray: bool = False)[source]#

Bases: object

Class for processing experiment results. You can use this class to read the results in an experiment output directory. This can be used in combination with PlotAnalysis to show the correlation between hyperparameters and metrics. Refer to Interpreting Results tutorial for more details on plotting and interpreting experiment results.

Parameters:

configtype[ParallelConfig] | ParallelConfig: The configuration class used
experiment_dirstr | Path: The path to the experiment directory.
cachebool: Whether to cache the results, by default False.
use_raybool: Whether to use ray for parallel processing, by default False.

Raises:

FileNotFoundError: If the experiment directory doesn’t exists.
ValueError: If RunConfig is provided instead of ParallelConfig.

Examples

Suppose you have an experiment output directory stored at <path to experiment output defined in config experiment_dir>. You can read the results from the directory as follows:

>>> directory_path = Path('<path to experiment output defined in config experiment_dir>')
>>> results = Results(config=ParallelConfig, experiment_dir=directory_path, use_ray=True)
>>> df = results.read_results(config_type=ParallelConfig, experiment_dir=directory_path)

Pass df to PlotAnalysis to create an analysis object for plotting the correlation between the hyperparameters and the metrics and save the plots to an output directory. For example, the following template generates plots for each of the numerical and categorical hyperparameters and saves them to ./plots directory. Here “Validation Accuracy” is the name of the main metric.

>>> analysis = PlotAnalysis(
...     df,
...     save_dir="./plots",
...     cache=True,
...     optim_metrics={"val_accuracy": Optim.max},
...     numerical_attributes=<numerical name remap keys names>,
...     categorical_attributes=<categorical name remap keys names>,
... )
>>> analysis.make_figures(
...     metric_name_remap={
...         "val_accuracy": "Validation Accuracy",
...     },
...     attribute_name_remap= attribute_name_remap
... )

Attributes:

experiment_dirPath: The path to the experiment directory.
configtype[ParallelConfig]: The configuration class used.
metric_mapdict[str, Optim]: A dictionary mapping metric names to their optimization direction.
data: pd.DataFrame: The processed results of the experiment. Refer to read_results for more details.
config_attrs: list[str]: The list of all the optimizable hyperparameter names
search_space: dict[str, ty.Any]: All the search space of the experiment.
numerical_attributes: list[str]: The list of all the numerical hyperparameter names
categorical_attributes: list[str]: The list of all the categorical hyperparameter names.

property metric_names: list[str]#

Get the list of all optimize directions

Returns:

list[str]: list of optimize metric names

Examples

>>> results.metric_names
["val_loss", "train_loss", "val_acc", "train_acc"]

classmethod read_results(config_type: type[ablator.config.main.ConfigBase], experiment_dir: Path | str, num_cpus: float | None = None) → DataFrame[source]#

Read all experiment results from the experiment directory (with ray if specified when initializing Result).

Parameters:

config_typetype[ConfigBase]: The configuration class.
experiment_dirPath | str: The experiment directory.
num_cpusfloat | None: Number of CPUs to use for ray processing, by default None.

Returns:

pd.DataFrame: A data frame of all the results from all experiments in experiment_dir.

Raises:

RuntimeError: If no results are present in the experiment directory.

Examples

>>> results.read_results(config_type = ParallelConfig, experiment_dir = "/tmp/results/experiment_8925_9991/")
train_loss      val_loss        best_iteration  best_loss       current_epoch   current_iteration       epochs
3658738                      0                       inf                 1               100             5
277102967     0.277085876     100                 0.277085876         2                   200             5
277154112     0.27619998      200                 0.27619998          3                   300             5
276529543     0.286987235     200                 0.27619998          4                   400             5
279828385     0.274052692     400                 0.274052692         5                   500             5
91869608                     0                       inf                 1               100             5

ablator.analysis.results.read_result(config_type: type[ablator.config.main.ConfigBase], json_path: Path) → DataFrame | None[source]#

Read the results of an experiment and return them as a pandas DataFrame.

The function reads the data from a JSON file, processes each row, and appends experiment attributes from a YAML configuration file. The resulting DataFrame is indexed and returned.

Parameters:

config_typetype[ConfigBase]: The type of configuration class that is used to load the experiment configuration from a YAML file.
json_pathPath: The path to the JSON file containing the results of the experiment.

Returns:

pd.DataFrame | None: A pandas DataFrame containing the processed experiment results. Returns None if there was an error in reading the json_path results.

Examples

Suppose result json file /tmp/myexperiment/results.json contains:

>>> json.load("results.json")
[{
"train_loss": 10.35,
"val_loss": NaN,
"current_epoch": 1,
},
{
"train_loss": 3.89,
"val_loss": 7.04,
"current_epoch": 2,
}]

And the corresponding configuration object run_config is created as:

>>> config = {
...     "model_config": {},
...     "train_config": {
...         'dataset': 'Fashion-mnist',
...         'batch_size': 32,
...         'epochs': 20,
...         'optimizer_config': {
...             'name': 'adam',
...             'arguments': {
...                 'betas': (0.9, 0.999), 'weight_decay': 0.0, 'lr': 0.001
...             }
...         },
...         'scheduler_config': None
...     },
...     "experiment_dir": '/tmp/experiments',
...     "random_seed": 42,
...     # ... other configs
...     "optim_metrics": None,
...     "optim_metric_name": None
... }
>>> run_config = RunConfig(**config)

The function read_result will return a pandas data frame like below:

>>> read_result(run_config, Path("/tmp/myexperiment/results.json"))
                        experiment_dir              keep_n_checkpoints      ...     train_loss      val_loss        current_epoch
trial_uid       step
experiments_    0           C:/tmp/experiments          3                   ...     10.35       NaN         1
                1           C:/tmp/experiments          3                   ...     3.89        7.04            2

ablator.analysis package#

Subpackages#

Submodules#

ablator.analysis.main module#

ablator.analysis.results module#

Module contents#