Configurations for parallel models experiments#

One of the main features of Ablator is the ability to train and optimize multiple models for hyperparameter optimization in parallel. The main components of this feature are SearchSpace and ParallelConfig.

class ablator.config.hpo.SearchSpace(*args, **kwargs)[source]

Bases: ConfigBase

Search space configuration, required in ParallelConfig, is used to define the search space for a hyperparameter.

Examples

In ablator, search space is defined for HPO that runs in parallel. For example, we want to run hyperparameter optimization on the model’s hidden size and activation function:

Given the following model configuration:

>>> @configclass
>>> class CustomModelConfig(ModelConfig):
>>>     hidden_size: int
>>>     activation: str
>>> my_model_config = CustomModelConfig(hidden_size=100, activation="relu")

The search space, which will be passed to ParallelConfig as a dictionary (notice how the key is expressed as model_config.<model-hyperparameter>), should look like this:

>>> search_space = {
...     "model_config.hidden_size": SearchSpace(value_range = [32, 64], value_type = 'int'),
...     "model_config.activation": SearchSpace(categorical_values = ["relu", "elu", "leakyRelu"])
... }

class ablator.config.mp.ParallelConfig(*args, **kwargs)[source]

Bases: RunConfig

Parallel training configuration, extending from RunConfig, defines the settings of a parallel experiment (number of trials to run for, number of concurrent trials, search space for hyperparameter search, etc.).

ParallelConfig encapsulates every configuration (model config, optimizer-scheduler config, train config, and the search space) needed to run a parallel experiment. The entire umbrella of configuration is then passed to ParallelTrainer that launches the experiment.

Examples

There are several steps before defining a parallel run config, let’s go through them one by one:

Define training config:

>>> my_optim_config = OptimizerConfig("sgd", {"lr": 0.5, "weight_decay": 0.5})
>>> my_scheduler_config = SchedulerConfig("step", arguments={"step_size": 1, "gamma": 0.99})
>>> train_config = TrainConfig(
...     dataset="[Dataset Name]",
...     batch_size=32,
...     epochs=10,
...     optimizer_config = my_optimizer_config,
...     scheduler_config = my_scheduler_config,
...     rand_weights_init = True
... )

Define model config, we want to run HPO on activation functions and model hidden size:

>>> @configclass
>>> class CustomModelConfig(ModelConfig):
>>>     hidden_size: int
>>>     activation: str
>>> model_config = CustomModelConfig(hidden_size=100, activation="relu")

Define search space:

>>> search_space = {
...     "train_config.optimizer_config.arguments.lr": SearchSpace(value_range = [0.001, 0.01], value_type = 'float'),
...     "model_config.hidden_size": SearchSpace(value_range = [32, 64], value_type = 'int'),
...     "model_config.activation": SearchSpace(categorical_values = ["relu", "elu", "leakyRelu"]),
... }

Lastly, we will define the run config from the previous config components (remember to redefine the parallel config to update the model config type to be CustomModelConfig):

>>> @configclass
>>> class CustomParallelConfig(ParallelConfig):
...    model_config: CustomModelConfig
>>> parallel_config = CustomParallelConfig(
...     train_config=train_config,
...     model_config=model_config,
...     metrics_n_batches = 800,
...     experiment_dir = "/tmp/experiments/",
...     device="cuda",
...     amp=True,
...     random_seed = 42,
...     total_trials = 20,
...     concurrent_trials = 20,
...     search_space = search_space,
...     optim_metrics = {"val_loss": "min"},
...     gpu_mb_per_experiment = 1024,
...     cpus_per_experiment = 1,
... )

Attributes:

total_trials: Optional[int]: total number of trials.
concurrent_trials: int: number of trials to run concurrently.
search_space: Dict[SearchSpace]: search space for hyperparameter search, eg. {"train_config.optimizer_config.arguments.lr": SearchSpace(value_range=[0, 10], value_type="int"),}
optim_metrics: Optional[Dict[Optim]]: metrics to optimize, eg. {"val_loss": "min"}
gpu_mb_per_experiment: int: CUDA memory requirement per experimental trial in MB. e.g. a value of 100 is equivalent to 100MB
search_algo: SearchAlgo = SearchAlgo.tpe: type of search algorithm.
ignore_invalid_params: bool = False: whether to ignore invalid parameters when sampling or raise an error.
remote_config: Optional[RemoteConfig] = None: remote storage configuration.