Configurations for parallel models experiments#

One of the main features of Ablator is the ability to train and optimize multiple models for ablation study/ hyperparameter optimization in parallel. The main components of this feature are SearchSpace and ParallelConfig.

class ablator.config.hpo.SearchSpace(*args: Any, **kwargs: Any)[source]

Bases: ConfigBase

Search space configuration, required in ParallelConfig, is used to define the search space for a hyperparameter. Its constructor takes as input keyword arguments that correspond to parameters defined in the Parameters section.

Parameters:

value_rangeOptional[Tuple[str, str]]: value range of the parameter.
categorical_valuesOptional[List[str]]: categorical values for the parameter.
subspacesOptional[List[Self]]: A list of search spaces,
sub_configurationOptional[SubConfiguration]: Subconfiguration for a SearchSpace.
value_typeFieldType: value type of the parameter’s values (continuous or discrete), by default FieldType.continuous.
n_binsOptional[int]: Total bins for grid sampling, optional.
logbool: To log, by default False.

Examples

In ablator, search space is defined for parallel ablation studies. For example, we want to run an ablation study on the model’s hidden size and activation function:

Given the following model configuration:

>>> @configclass
>>> class CustomModelConfig(ModelConfig):
>>>     hidden_size: int
>>>     activation: str
>>> my_model_config = CustomModelConfig(hidden_size=100, activation="relu")

The search space, which will be passed to ParallelConfig as a dictionary (notice how the key is expressed as model_config.<model-hyperparameter>), should look like this:

>>> search_space = {
...     "model_config.hidden_size": SearchSpace(value_range = [32, 64], value_type = 'int'),
...     "model_config.activation": SearchSpace(categorical_values = ["relu", "elu", "leakyRelu"])
... }

Attributes:

value_range: Optional[Tuple[str, str]]: Value range of the parameter.
categorical_values: Optional[List[str]]: Categorical values for the parameter.
subspaces: Optional[List[Self]]: A list of search spaces.
sub_configuration: Optional[SubConfiguration]: Subconfiguration for a SearchSpace.
value_type: FieldType = FieldType.continuous: Value type of the parameter’s values (continuous or discrete).
n_bins: Optional[int]: Total bins for grid sampling.
log: bool: To log, by default False.

class ablator.config.mp.ParallelConfig(*args: Any, debug: bool = False, **kwargs: Any)[source]

Bases: RunConfig

Parallel training configuration, extending from RunConfig, defines the settings of a parallel experiment (number of trials to run for, number of concurrent trials, search space for hyperparameter search, etc.).

ParallelConfig encapsulates every configuration (model config, optimizer-scheduler config, train config, and the search space) needed to run a parallel experiment. The entire umbrella of configuration is then passed to ParallelTrainer which launches the experiment.

Examples

There are several steps before defining a parallel run config, let’s go through them one by one:

Define training config:

>>> my_optim_config = OptimizerConfig("sgd", {"lr": 0.5, "weight_decay": 0.5})
>>> my_scheduler_config = SchedulerConfig("step", arguments={"step_size": 1, "gamma": 0.99})
>>> train_config = TrainConfig(
...     dataset="[Dataset Name]",
...     batch_size=32,
...     epochs=10,
...     optimizer_config = my_optimizer_config,
...     scheduler_config = my_scheduler_config
... )

Define model config, we want to run HPO on activation functions and model hidden size:

>>> @configclass
>>> class CustomModelConfig(ModelConfig):
>>>     hidden_size: int
>>>     activation: str
>>> model_config = CustomModelConfig(hidden_size=100, activation="relu")

Define search space:

>>> search_space = {
...     "train_config.optimizer_config.arguments.lr": SearchSpace(
...         value_range=[0.001, 0.01], value_type="float"
...     ),
...     "model_config.hidden_size": SearchSpace(value_range=[32, 64], value_type="int"),
...     "model_config.activation": SearchSpace(
...         categorical_values=["relu", "elu", "leakyRelu"]
...     ),
... }

Lastly, we will define the run config from the previous config components (remember to redefine the parallel config to update the model config type to be CustomModelConfig):

>>> @configclass
>>> class CustomParallelConfig(ParallelConfig):
...    model_config: CustomModelConfig
>>> parallel_config = CustomParallelConfig(
...     train_config=train_config,
...     model_config=model_config,
...     metrics_n_batches = 800,
...     experiment_dir = "/tmp/experiments/",
...     device="cuda",
...     amp=True,
...     random_seed = 42,
...     total_trials = 20,
...     concurrent_trials = 20,
...     search_space = search_space,
...     optim_metrics = {"val_loss": "min"},
...     optim_metric_name = "val_loss",
...     gpu_mb_per_experiment = 1024
... )

Attributes:

total_trials: Optional[int]: total number of trials.
concurrent_trials: int: number of trials to run concurrently.
search_space: Dict[SearchSpace]: search space for hyperparameter search, eg. {"train_config.optimizer_config.arguments.lr": SearchSpace(value_range=[0, 10], value_type="int"),}
gpu_mb_per_experiment: int: CUDA memory requirement per experimental trial in MB. e.g. a value of 100 is equivalent to 100MB
search_algo: SearchAlgo = SearchAlgo.tpe: type of search algorithm.
ignore_invalid_params: bool = False: whether to ignore invalid parameters when sampling or raise an error.
remote_config: Optional[RemoteConfig] = None: remote storage configuration.