ablator.analysis package#

Subpackages#

Submodules#

ablator.analysis.main module#

class ablator.analysis.main.Analysis(results: DataFrame | Results, categorical_attributes: list[str] | None = None, numerical_attributes: list[str] | None = None, optim_metrics: dict[str, ablator.config.proto.Optim] | None = None, save_dir: str | None = None, cache: bool = False)[source]#

Bases: object

A class that stores and processes the attributes, metrics, and other data for the plotting of the experiment result.

Parameters:
resultspd.DataFrame | Results

The result dataframe.

categorical_attributeslist[str] | None

The list of all the categorical hyperparameter names, by default None.

numerical_attributeslist[str] | None

The list of all the numerical hyperparameter names, by default None.

optim_metricsdict[str, Optim] | None

A dictionary mapping metric names to optimization directions, by default None.

save_dirstr | None

The directory to save analysis results to, by default None.

cachebool

Whether to cache results, by default False.

Raises:
FileNotFoundError

if the provided save_dir to save plots don’t exists.

ValueError

if cache is True but no save_dir is provided.

Attributes:
optim_metricsdict[str, Optim]

A dictionary mapping metric names to optimization directions.

save_dirPath | None

The directory to save analysis results to.

cacheMemory | None

A joblib memory cache for saving results.

categorical_attributeslist[str]

The list of all the categorical hyperparameter names

numerical_attributeslist[str]

The list of all the numerical hyperparameter names

experiment_attributeslist[str]

The list of all the hyperparameter names

resultspd.DataFrame

The dataframe extracted from the results file based on given metrics names and hyperparameter names.

property metric_names: list[str]#
Returns:
list[str]

list of all the metrics that will be plotted w.r.t hyperparameters.

Examples

>>> Make PlotAnalysis's object
plots = Analysis(
    ...
    optim_metrics={"val_loss": Optim.min, "train_loss": Optim.min},
)
metrics = plots.metric_names
>>> returns
['val_loss', 'train_loss']

ablator.analysis.results module#

class ablator.analysis.results.Results(config: type[ablator.config.mp.ParallelConfig] | ParallelConfig, experiment_dir: str | Path, cache: bool = False, use_ray: bool = False)[source]#

Bases: object

Class for processing experiment results. You can use this class to read the results in an experiment output directory. This can be used in combination with PlotAnalysis to show the correlation between hyperparameters and metrics. Refer to Interpreting Results tutorial for more details on plotting and interpreting experiment results.

Parameters:
configtype[ParallelConfig] | ParallelConfig

The configuration class used

experiment_dirstr | Path

The path to the experiment directory.

cachebool

Whether to cache the results, by default False.

use_raybool

Whether to use ray for parallel processing, by default False.

Raises:
FileNotFoundError

If the experiment directory doesn’t exists.

ValueError

If RunConfig is provided instead of ParallelConfig.

Examples

Suppose you have an experiment output directory stored at <path to experiment output defined in config experiment_dir>. You can read the results from the directory as follows:

>>> directory_path = Path('<path to experiment output defined in config experiment_dir>')
>>> results = Results(config=ParallelConfig, experiment_dir=directory_path, use_ray=True)
>>> df = results.read_results(config_type=ParallelConfig, experiment_dir=directory_path)

Pass df to PlotAnalysis to create an analysis object for plotting the correlation between the hyperparameters and the metrics and save the plots to an output directory. For example, the following template generates plots for each of the numerical and categorical hyperparameters and saves them to ./plots directory. Here “Validation Accuracy” is the name of the main metric.

>>> analysis = PlotAnalysis(
...     df,
...     save_dir="./plots",
...     cache=True,
...     optim_metrics={"val_accuracy": Optim.max},
...     numerical_attributes=<numerical name remap keys names>,
...     categorical_attributes=<categorical name remap keys names>,
... )
>>> analysis.make_figures(
...     metric_name_remap={
...         "val_accuracy": "Validation Accuracy",
...     },
...     attribute_name_remap= attribute_name_remap
... )
Attributes:
experiment_dirPath

The path to the experiment directory.

configtype[ParallelConfig]

The configuration class used.

metric_mapdict[str, Optim]

A dictionary mapping metric names to their optimization direction.

data: pd.DataFrame

The processed results of the experiment. Refer to read_results for more details.

config_attrs: list[str]

The list of all the optimizable hyperparameter names

search_space: dict[str, ty.Any]

All the search space of the experiment.

numerical_attributes: list[str]

The list of all the numerical hyperparameter names

categorical_attributes: list[str]

The list of all the categorical hyperparameter names.

property metric_names: list[str]#

Get the list of all optimize directions

Returns:
list[str]

list of optimize metric names

Examples

>>> results.metric_names
["val_loss", "train_loss", "val_acc", "train_acc"]
classmethod read_results(config_type: type[ablator.config.main.ConfigBase], experiment_dir: Path | str, num_cpus: float | None = None) DataFrame[source]#

Read all experiment results from the experiment directory (with ray if specified when initializing Result).

Parameters:
config_typetype[ConfigBase]

The configuration class.

experiment_dirPath | str

The experiment directory.

num_cpusfloat | None

Number of CPUs to use for ray processing, by default None.

Returns:
pd.DataFrame

A data frame of all the results from all experiments in experiment_dir.

Raises:
RuntimeError

If no results are present in the experiment directory.

Examples

>>> results.read_results(config_type = ParallelConfig, experiment_dir = "/tmp/results/experiment_8925_9991/")
train_loss      val_loss        best_iteration  best_loss       current_epoch   current_iteration       epochs
13.3658738                      0                       inf                 1               100             5
2.277102967     0.277085876     100                 0.277085876         2                   200             5
2.277154112     0.27619998      200                 0.27619998          3                   300             5
2.276529543     0.286987235     200                 0.27619998          4                   400             5
2.279828385     0.274052692     400                 0.274052692         5                   500             5
11.91869608                     0                       inf                 1               100             5
ablator.analysis.results.read_result(config_type: type[ablator.config.main.ConfigBase], json_path: Path) DataFrame | None[source]#

Read the results of an experiment and return them as a pandas DataFrame.

The function reads the data from a JSON file, processes each row, and appends experiment attributes from a YAML configuration file. The resulting DataFrame is indexed and returned.

Parameters:
config_typetype[ConfigBase]

The type of configuration class that is used to load the experiment configuration from a YAML file.

json_pathPath

The path to the JSON file containing the results of the experiment.

Returns:
pd.DataFrame | None

A pandas DataFrame containing the processed experiment results. Returns None if there was an error in reading the json_path results.

Examples

Suppose result json file /tmp/myexperiment/results.json contains:

>>> json.load("results.json")
[{
"train_loss": 10.35,
"val_loss": NaN,
"current_epoch": 1,
},
{
"train_loss": 3.89,
"val_loss": 7.04,
"current_epoch": 2,
}]

And the corresponding configuration object run_config is created as:

>>> config = {
...     "model_config": {},
...     "train_config": {
...         'dataset': 'Fashion-mnist',
...         'batch_size': 32,
...         'epochs': 20,
...         'optimizer_config': {
...             'name': 'adam',
...             'arguments': {
...                 'betas': (0.9, 0.999), 'weight_decay': 0.0, 'lr': 0.001
...             }
...         },
...         'scheduler_config': None
...     },
...     "experiment_dir": '/tmp/experiments',
...     "random_seed": 42,
...     # ... other configs
...     "optim_metrics": None,
...     "optim_metric_name": None
... }
>>> run_config = RunConfig(**config)

The function read_result will return a pandas data frame like below:

>>> read_result(run_config, Path("/tmp/myexperiment/results.json"))
                        experiment_dir              keep_n_checkpoints      ...     train_loss      val_loss        current_epoch
trial_uid       step
experiments_    0           C:/tmp/experiments          3                   ...     10.35       NaN         1
                1           C:/tmp/experiments          3                   ...     3.89        7.04            2

Module contents#