Configurations for single model experiments#
- class ablator.config.proto.RunConfig(*args, **kwargs)[source]
Bases:
ConfigBaseThe base configuration that defines the setting of an experiment (experiment main directory, number of checkpoints to maintain, hardware device to use, etc.). You can use this to configure the experiment of running a single prototype model.
RunConfigencapsulates every configuration (model config, optimizer-scheduler config, train config) needed for a prototype experiment. The entire umbrella of configurations is then passed toProtoTrainerwhich launches the prototype experiment.Examples
There are several steps before defining a run config, let’s go through them one by one:
Define model config, here we use default one with no custom hyperparameters (sometimes you would want to define model config when running HPO on your model’s hyperparameters in the parallel experiments with
`ParallelTrainer`, which requires`ParallelConfig`instead of`RunConfig`):
>>> model_config = ModelConfig()
Define training config:
>>> my_optimizer_config = OptimizerConfig("sgd", {"lr": 0.5, "weight_decay": 0.5}) >>> my_scheduler_config = SchedulerConfig("step", arguments={"step_size": 1, "gamma": 0.99}) >>> train_config = TrainConfig( ... dataset="[Dataset Name]", ... batch_size=32, ... epochs=10, ... optimizer_config = my_optimizer_config, ... scheduler_config = my_scheduler_config, ... rand_weights_init = True ... )
Lastly, we will create the run config, which has train config and model config as parameters:
>>> run_config = RunConfig( ... train_config=train_config, ... model_config=model_config, ... metrics_n_batches = 800, ... experiment_dir = "/tmp/experiments", ... device="cpu", ... amp=False, ... random_seed = 42 ... )
- Attributes:
- experiment_dir: Optional[str] = None
location to store experiment artifacts.
- random_seed: Optional[int] = None
random seed.
- train_config: TrainConfig
training configuration. (check
TrainConfigfor more details)- model_config: ModelConfig
model configuration. (check
ModelConfigfor more details)- keep_n_checkpoints: int = 3
number of latest checkpoints to keep.
- tensorboard: bool = True
whether to use tensorboardLogger.
- amp: bool = True
whether to use automatic mixed precision when running on gpu.
- device: str = “cuda” or “cpu”
device to run on.
- verbose: Literal[“console”, “progress”, “silent”] = “console”
verbosity level.
- eval_subsample: float = 1
fraction of the dataset to use for evaluation.
- metrics_n_batches: int = 32
max number of batches stored in every tag(train, eval, test) for evaluation.
- metrics_mb_limit: int = 100
max number of megabytes stored in every tag(train, eval, test) for evaluation.
- early_stopping_iter: Optional[int] = None
The maximum allowed difference between the current iteration and the last iteration with the best metric before applying early stopping. Early stopping will be triggered if the difference
(current_itr - best_itr)exceedsearly_stopping_iter. If set toNone, early stopping will not be applied.- eval_epoch: float = 1
The epoch interval between two evaluations.
- log_epoch: float = 1
The epoch interval between two logging.
- init_chkpt: Optional[str] = None
path to a checkpoint to initialize the model with.
- warm_up_epochs: float = 0
number of epochs marked as warm up epochs.
- divergence_factor: float = 100
if
cur_loss > best_loss > divergence_factor, the model is considered to have diverged.