Configurations for parallel models experiments#
One of the main features of Ablator is the ability to train and optimize
multiple models for hyperparameter optimization in parallel. The main
components of this feature are SearchSpace and ParallelConfig.
- class ablator.config.hpo.SearchSpace(*args, **kwargs)[source]
Bases:
ConfigBaseSearch space configuration, required in
ParallelConfig, is used to define the search space for a hyperparameter.Examples
In ablator, search space is defined for HPO that runs in parallel. For example, we want to run hyperparameter optimization on the model’s hidden size and activation function:
Given the following model configuration:
>>> @configclass >>> class CustomModelConfig(ModelConfig): >>> hidden_size: int >>> activation: str >>> my_model_config = CustomModelConfig(hidden_size=100, activation="relu")
The search space, which will be passed to
ParallelConfigas a dictionary (notice how the key is expressed asmodel_config.<model-hyperparameter>), should look like this:
>>> search_space = { ... "model_config.hidden_size": SearchSpace(value_range = [32, 64], value_type = 'int'), ... "model_config.activation": SearchSpace(categorical_values = ["relu", "elu", "leakyRelu"]) ... }
- class ablator.config.mp.ParallelConfig(*args, **kwargs)[source]
Bases:
RunConfigParallel training configuration, extending from
RunConfig, defines the settings of a parallel experiment (number of trials to run for, number of concurrent trials, search space for hyperparameter search, etc.).ParallelConfigencapsulates every configuration (model config, optimizer-scheduler config, train config, and the search space) needed to run a parallel experiment. The entire umbrella of configuration is then passed toParallelTrainerthat launches the experiment.Examples
There are several steps before defining a parallel run config, let’s go through them one by one:
Define training config:
>>> my_optim_config = OptimizerConfig("sgd", {"lr": 0.5, "weight_decay": 0.5}) >>> my_scheduler_config = SchedulerConfig("step", arguments={"step_size": 1, "gamma": 0.99}) >>> train_config = TrainConfig( ... dataset="[Dataset Name]", ... batch_size=32, ... epochs=10, ... optimizer_config = my_optimizer_config, ... scheduler_config = my_scheduler_config, ... rand_weights_init = True ... )
Define model config, we want to run HPO on activation functions and model hidden size:
>>> @configclass >>> class CustomModelConfig(ModelConfig): >>> hidden_size: int >>> activation: str >>> model_config = CustomModelConfig(hidden_size=100, activation="relu")
Define search space:
>>> search_space = { ... "train_config.optimizer_config.arguments.lr": SearchSpace(value_range = [0.001, 0.01], value_type = 'float'), ... "model_config.hidden_size": SearchSpace(value_range = [32, 64], value_type = 'int'), ... "model_config.activation": SearchSpace(categorical_values = ["relu", "elu", "leakyRelu"]), ... }
Lastly, we will define the run config from the previous config components (remember to redefine the parallel config to update the model config type to be
CustomModelConfig):
>>> @configclass >>> class CustomParallelConfig(ParallelConfig): ... model_config: CustomModelConfig >>> parallel_config = CustomParallelConfig( ... train_config=train_config, ... model_config=model_config, ... metrics_n_batches = 800, ... experiment_dir = "/tmp/experiments/", ... device="cuda", ... amp=True, ... random_seed = 42, ... total_trials = 20, ... concurrent_trials = 20, ... search_space = search_space, ... optim_metrics = {"val_loss": "min"}, ... gpu_mb_per_experiment = 1024, ... cpus_per_experiment = 1, ... )
- Attributes:
- total_trials: Optional[int]
total number of trials.
- concurrent_trials: int
number of trials to run concurrently.
- search_space: Dict[SearchSpace]
search space for hyperparameter search, eg.
{"train_config.optimizer_config.arguments.lr": SearchSpace(value_range=[0, 10], value_type="int"),}- optim_metrics: Optional[Dict[Optim]]
metrics to optimize, eg.
{"val_loss": "min"}- gpu_mb_per_experiment: int
CUDA memory requirement per experimental trial in MB. e.g. a value of 100 is equivalent to 100MB
- search_algo: SearchAlgo = SearchAlgo.tpe
type of search algorithm.
- ignore_invalid_params: bool = False
whether to ignore invalid parameters when sampling or raise an error.
- remote_config: Optional[RemoteConfig] = None
remote storage configuration.