Configurations for single model experiments#

class ablator.config.proto.RunConfig(*args: Any, debug: bool = False, **kwargs: Any)[source]

Bases: ConfigBase

The base run configuration that defines the setting of an experiment (experiment main directory, number of checkpoints to maintain, hardware device to use, etc.). You can use this to configure the experiment of a single prototype model.

RunConfig encapsulates every configuration (model config, optimizer-scheduler config, train config) needed for an experiment. This entire umbrella of configurations is then passed to ProtoTrainer which launches the prototype experiment.

Examples

There are several steps before defining a run config, let’s go through them one by one:

  • Define training config:

>>> my_optimizer_config = OptimizerConfig("sgd", {"lr": 0.5, "weight_decay": 0.5})
>>> my_scheduler_config = SchedulerConfig("step", arguments={"step_size": 1, "gamma": 0.99})
>>> train_config = TrainConfig(
...     dataset="[Dataset Name]",
...     batch_size=32,
...     epochs=10,
...     optimizer_config = my_optimizer_config,
...     scheduler_config = my_scheduler_config,
...     rand_weights_init = True
... )
  • Define model config, here we use default one with no custom hyperparameters (sometimes you would want to customize the model config to run HPO on your model’s hyperparameters in the parallel experiments with `ParallelTrainer`, which requires `ParallelConfig` instead of `RunConfig`):

>>> model_config = ModelConfig()
  • Lastly, we will create the run config, which has train config and model config as parameters:

>>> run_config = RunConfig(
...     train_config=train_config,
...     model_config=model_config,
...     metrics_n_batches = 800,
...     experiment_dir = "/tmp/experiments",
...     device="cpu",
...     amp=False,
...     random_seed = 42
... )
Attributes:
experiment_dir: Stateless[Optional[str]]

Location to store experiment artifacts, by default None.

random_seed: Optional[int]

Random seed, by default None.

train_config: TrainConfig

Training configuration.

model_config: ModelConfig

Model configuration.

keep_n_checkpoints: Stateless[int]

Number of latest checkpoints to keep, by default 3.

tensorboard: Stateless[bool]

Whether to use tensorboardLogger, by default True.

amp: Stateless[bool]

Whether to use automatic mixed precision when running on gpu, by default True.

device: Stateless[str]

Device to run on, by default "cuda".

verbose: Stateless[Literal[“console”, “progress”, “silent”]]

Verbosity level, by default "console".

eval_subsample: Stateless[float]

Fraction of the dataset to use for evaluation, by default 1.

metrics_n_batches: Stateless[int]

Max number of batches stored in every tag(train, eval, test) for evaluation, by default 32.

metrics_mb_limit: Stateless[int]

Max number of megabytes stored in every tag(train, eval, test) for evaluation, by default 10_000  # 10GB.

early_stopping_iter: Stateless[Optional[int]]

The maximum allowed difference between the current iteration and the last iteration with the best metric before applying early stopping. Early stopping will be triggered if the difference (current_itr - best_itr) exceeds early_stopping_iter. If set to None, early stopping will not be applied. By default None.

eval_epoch: Stateless[float]

The epoch interval between two evaluations, by default 1.

log_epoch: Stateless[float]

The epoch interval between two logging, by default 1.

init_chkpt: Stateless[Optional[str]]

Path to a checkpoint to initialize the model with, by default None.

warm_up_epochs: Stateless[float]

Number of epochs marked as warm up epochs, by default 1.

divergence_factor: Stateless[Optional[float]]

If cur_loss > best_metric > divergence_factor, the model is considered to have diverged, by default 10.

optim_metrics: Stateless[Optional[Dict[Optim]]]

The optimization metric to use for meta-training procedures, such as for model saving and lr scheduling.

optim_metric_name: Stateless[Optional[str]]

The name of the metric to be optimized.