Configuration basics#

Ablator embraces a versatile configuration system that configures every facet of training machine learning models, covering not only the model architecture but also the training environment.

Think of this configuration system as a blueprint for crafting experiments. By leveraging this system, Ablator orchestrates the creation and setup of experiments, seamlessly integrating the necessary configurations.

Furthermore, Ablator offers the flexibility to dynamically construct a hierarchical configuration through composition. You have the choice to override settings either via yaml configuration files and command-line inputs, or to directly use Python objects and classes. Explore these illustrative examples or delve into this section of this tutorial to gain insights into implementing these two approaches.

In this tutorial, we will explain all configuration-related concepts in Ablator. We will also demonstrate all necessary steps to configure an experiment in Ablator.

Configuration categories#

In ablator, configurations are organized in different categories, these include:

These configuration classes will be used together to configure an experiment.

Model Configuration#

This configuration class is used to define hyperparameters specific to the model of interest. Later, you can use this configuration to construct the model.

There are 2 steps that are required after defining a model config class for your model:

  • Pass the model config to the main model’s constructor so you can construct the model using the attributes that’s defined in the config.

  • Create a custom running configuration class and update model_config class type to the newly created model config class.

Note

It is required to return the outputs (which must be in a dictionary format. Example: {"y_pred": <model prediction>, "y_true": <true labels>}) and loss in the forward method of the main model. Here the loss value will be considered an auxiliary metric that will be recorded for later analysis.

Model configuration is required when creating the run configuration (RunConfig.model_config and ParallelConfig.model_config).

In addition, you can create a search space over these parameters and then use ablator to run Hyperparameter optimization (HPO). A sample use case for this is when you want to test different values for model size, number of layers, activation functions, etc. You can do this by creating a custom model configuration class from ModelConfig that has these hyperparameters as its attributes and create a search space via SearchSpace class. Refer to the Hyperparameter Optimization tutorial for more details.

Training Configuration#

This configuration class defines the training setting (e.g., batch size, number of epochs, the optimizer to use, etc.). Two important attributes to metion are optimizer_config and scheduler_config. As the names suggest, they configure the optimizer and scheduler to be used in the training process.

Training configuration is required when creating the run configuration (RunConfig.train_config or ParallelConfig.train_config)

Parameter

Usage

dataset (str)

dataset name. maybe used in custom dataset loader functions.

batch_size (int)

batch size.

epochs (int)

number of epochs to train.

optimizer_config (OptimizerConfig)

optimizer configuration. (check OptimizerConfig for more details)

scheduler_config (Optional[SchedulerConfig])

scheduler configuration. (check SchedulerConfig for more details)

rand_weights_init (bool, defaults to True)

whether to initialize model weights randomly.

Optimizer and Scheduler Configurations#

By default, Abaltor takes care of creating the optimizer (and optionally the scheduler) for training models. Thus, you also need to configure them.

OptimizerConfig is used to configure the optimizer for the training process. Currently, we support SGD optimizer, Adam optimizer, and AdamW optimizer.

SchedulerConfig, on the other hand, can be used to configure the learning rate scheduler for the training process. Currently, we support StepLR scheduler, OneCycleLR scheduler, and ReduceLROnPlateau scheduler.

Both of these config classes have similar arguments:

Parameter

Usage

name (str)

The type of the scheduler or optimizer, this can be any in ['sgd', 'adam', 'adamw'] for optimizers and in ['none', 'step', 'cycle', 'plateau'] for schedulers.

arguments (OptimizerArgs)

The arguments for the scheduler or optimizer, specific to a certain type of scheduler or scheduler. For opimizer, must include an item for learning rate, e.g. {"lr": 0.1}

The table below shows how arguments can be defined for each type of optimzer:

Optimizer type

Arguments

sgd

weight_decay (defaults to 0.0): Weight decay ratemomentum (defaults to 0.0): Momentum factor

adam

betas (defaults to (0.5, 0.9)): Coefficients for computing running averages of gradient and its square.weight_decay (defaults to 0.0): Weight decay rate.

adamw

betas (defaults to (0.9, 0.999)): Coefficients for computing running averages of gradient and its square.eps (defaults to 1e-8): Term added to the denominator to improve numerical stability.weight_decay (defaults to 0.0): Weight decay rate.

The table below shows how arguments can be defined for each type of scheduler:

Sc heduler type

Arguments

cycle

max_lr : Upper learning rate boundaries in the cycle. total_steps : The total number of steps to run the scheduler in a cycle. step_when (defaults to ``”train”``): The step type at which the scheduler.step() should be invoked: 'train', 'val', or 'epoch'.

plataeu

patience (defaults to 10): Number of epochs with no improvement after which learning rate will be reduced. min_lr (defaults to 1e-5): A lower bound on the learning rate. mode (defaults to “min”): One of 'min', 'max', or 'auto', which defines the direction of optimization, so as to adjust the learning rate accordingly, i.e when a certain metric ceases improving. factor (defaults to 0.0): Factor by which the learning rate will be reduced: new_lr = lr * factor. threshold (defaults to 1e-4): Threshold for measuring the new optimum, to only focus on significant changes. verbose (defaults to False): If True, prints a message to stdout for each update. step_when (defaults to “val”): The step type at which the scheduler should be invoked: 'train', 'val', or 'epoch'.

step

step_size (defaults to 1): Period of learning rate decay. gamma (defaults to 0.99): Multiplicative factor of learning rate decay.99. step_when (defaults to “epoch”): The step type at which the scheduler should be invoked: 'train', 'val', or 'epoch'.

Running Configurations#

Running configurations define the environment of an experiment (experiment main directory, number of checkpoints to maintain, hardware device to use, etc.). There are 2 types of running configurations:

  • RunConfig for prototype experiments

  • ParallelConfig for ablation experiments

RunConfig for prototype experiments#

The table below summarizes the parameters:

Parameter

Usage

experiment_dir (Stateless[Optional[str]], defaults to None)

location to store experiment artifacts.

random_seed (Optional[int], defaults to None)

random seed.

train_config (TrainConfig)

training configuration. (check TrainConfig for more details)

model_config (ModelConfig)

model configuration. (check ModelConfig for more details)

keep_n_checkpoints (Stateless[int], defaults to 3)

number of latest checkpoints to keep.

tensorboard (Stateless[bool], defaults to True)

whether to use tensorboardLogger.

amp (Stateless[bool], defaults to True)

whether to use automatic mixed precision when running on gpu.

device (Stateless[str], defaults to “cuda”)

device to run on.

verbose * (Stateless[Literal[“console”, “progress”, “silent”]], defaults to “console”)*

verbosity level.

eval_subsample (Stateless[float], defaults to 1)

fraction of the dataset to use for evaluation.

metrics_n_batches (Stateless[int], defaults to 32)

max number of batches stored in every tag(train, eval, test) for evaluation.

metrics_mb_limit (Stateless[int], defaults to 100)

max number of megabytes stored in every tag(train, eval, test) for evaluation.

early_stopping_iter (Stateless[Optional[int]], defaults to None)

The maximum allowed difference between the current iteration and the last iteration with the best metric before applying early stopping. Early stopping will be triggered if the difference (current_itr-best_itr) exceeds early_stopping_iter.If set to None, early stopping will not be applied.

eval_epoch (Stateless[float], defaults to 1)

The epoch interval between two evaluations.

log_epoch (Stateless[float], defaults to 1)

The epoch interval between two logging.

init_chkpt (Stateless[Optional[str]], defaults to None)

path to a checkpoint to initialize the model with.

warm_up_epochs (Stateless[float], defaults to 1)

number of epochs marked as warm up epochs.

divergence_factor (Stateless[Optional[float]], defaults to 100)

if cur_loss > best_loss > divergence_factor, the model is considered to have diverged.

ParallelConfig for ablation experiments#

ParallelConfig is a subclass of RunConfig. Therefore, it has all attributes RunConfig has. Additionally, it introduces other attributes to configure the parallel training process with horizontal scaling of a single experiment:

Parameters

Usage

total_trials: (Optional[int])

total number of trials.

concurrent_trials: (int)

number of trials to run concurrently.

search_space: (Dict[SearchSpace])

search space for hyperparameter search, eg. {"train_config.optimizer_config.arguments.lr": SearchSpace(value_range=[0, 10], value_type="int"),}

optim_metrics: (Optional[Dict[Optim]])

metrics to optimize, eg. {"val_loss": "min"}

gpu_mb_per_experiment: (int)

CUDA memory requirement per experimental trial in MB. e.g. a value of 100 is equivalent to 100MB

search_algo: (SearchAlgo, defaults to SearchAlgo.tpe)

type of search algorithm.

ignore_invalid_params: (bool, defaults to False)

whether to ignore invalid parameters when sampling or raise an error.

remote_config: (Optional[RemoteConfig], defaults to None)

remote storage configuration.

search_space is used to define a set of continuous or categorical/discrete values for a certain hyperparameter that you want to optimize. Refer to Search Space basics to learn more about how to use it.

Configure your experiments#

Now let’s combine everything to configure your experiment!

In most cases, we will first configure our model (or not configure it at all if you’re not running ablation study on the model architecture).

In this example, we create a configuration class MyModelConfig for a simple 1-layer neural network model with the following hyperparameters: input size (to be inferred); hidden layer dimension, activation function, and dropout rate (all of which are stateful). This configuration then will be used to construct the neural network model MyCustomModel:

from ablator import RunConfig, ModelConfig, Stateless, Derived, configclass

import torch.nn as nn
import torch

@configclass
class MyModelConfig(ModelConfig):
    inp_size: Derived[int]
    hidden_dim: int
    activation: str
    dropout: float

# Construct the model using the configuration
class MyCustomModel(nn.Module):
    def __init__(self, config: MyModelConfig) -> None:
        super().__init__()
        self.linear = nn.Linear(config.inp_size, config.hidden_dim)
        self.dropout = nn.Dropout(config.dropout)
        if config.activation == "relu":
            self.activate = nn.ReLU()
        elif config.activation == "elu":
            self.activate = nn.ELU()
        self.criterion = nn.CrossEntropyLoss()

    def forward(self, x: torch.Tensor, labels: None):
        out = self.linear(x)
        out = self.dropout(out)
        out = self.activate(out)

        loss = self.criterion(out, labels)

        return {"preds": out, "labels": labels}, loss

my_model_config = MyModelConfig(hidden_dim=100, activation="relu", dropout=0.3)

Notice how we’re returning a dictionary for model’s predictions and labels and loss value in the forward method.

We next create a training configuration, which requires an optimizer config and an optional scheduler config. Here we create optimizer_config that configs an SGD optimizer, and scheduler_config that configs a OneCycleLR scheduler, and then use them in train_config:

from ablator import OptimizerConfig, SchedulerConfig
from ablator import TrainConfig

optimizer_config = OptimizerConfig(name="sgd", arguments={"lr": 0.1})
scheduler_config = SchedulerConfig(name="cycle", arguments={"max_lr": 0.5, "total_steps": 50})

train_config = TrainConfig(
    dataset="test",
    batch_size=128,
    epochs=2,
    optimizer_config=optimizer_config,
    scheduler_config=scheduler_config,
)

The last step is to create a run_config object. This object combines train_config and my_model_config, along with runtime settings like verbosity and device. However, we also need to redefine the run config class to update its model_config attribute from ModelConfig (by default) to MyModelConfig:

@configclass
class CustomRunConfig(RunConfig):
    model_config: MyModelConfig

run_config = CustomRunConfig(
    train_config=train_config,
    model_config=my_model_config,
    verbose="silent",
    device="cpu",
)

That’s it, we have finished configuring our experiment! With this, we are half-way to launching an ablation experiment. Refer to Prototyping models and Hyperparameter Optimization tutorials for the next steps after configuration to launch the experiment.

Note

All configuration classes must inherits from ConfigBase and decorated with @configclass decorator, you can see this in MyModelConfig and CustomRunConfig classes in the example above.

Ablator custom data types#

An important part of Ablator’s configuration system is the incorporation of custom data types, which are used to define data type for configuration attributes. The framework created three special types: Stateless, Stateful, and Derived. These are custom Python annotations to define configuration attributes to which the experiment state is agnostic, aka does not have any impact on the experiment state (which can be Complete, Running, Pending, Pruned, etc. Read more about experiment state from our paper).

  • Stateless attributes can take different values between trials or experiments. For example, learning rate should be stateless, as we can train models with different learning rates. Note that if you’re declaring a variable to be Stateless, it must be assigned an initial value before launching the experiment.

  • Stateful attributes, opposite to Stateless, must have the same value between different experiments. For example, a binary classification model should always have output size of 2. Stateful variables, defined as a primitive datatype (no annotating needed), must be assigned with values before launching the experiment.

  • Derived attributes are Stateful and are un-decided at the start of the experiment. Their values are determined by internal experiment processes that can depend on other experimental attributes, e.g model input size that depends on the dataset.

Note

  • The reason for creating these annotations is that Ablator supports stateful experiment design, so the configuration should be unambiguous at the initialization state. And the use of these annotations assures the unambiguity of the configuration.

  • For more information about our stateful experiment design, see our paper: ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments

Alternatives to constructing configuration objects#

There are three methods to configure an experiment: named arguments, file-based, or dictionary-based. All previous code snippets are examples of the named-arguments method. Now let’s look at how file based method and dictionary based method work.

File-based#

File based configuration is a way for you to create simple configuration files. You can use <Run config class>.load(path/to/yaml/file) method to create configuration with values provided in the config file.

To write these config files, simply follow yaml syntax. Make sure that the attributes match with those in the config classes (either default config classes from ablator, or custom ones like MyModelConfig). The following example shows what a config yaml file looks like. We will name it config.yaml:

experiment_dir: "/tmp/dir"
train_config:
  dataset: test
  batch_size: 128
  epochs: 2
  optimizer_config:
    name: sgd
    arguments:
      lr: 0.1
  scheduler_config:
    name: cycle
    arguments:
      max_lr: 0.5
      total_steps: 50
model_config:
  inp_size: 50
  hidden_dim: 100
  activation: "relu"
  dropout: 0.15
verbose: "silent"
device: "cpu"

Now in your code, load these values to create the config object:

config = CustomRunConfig.load("path/to/yaml/file")

Note that since we created a custom running configuration class CustomRunConfig that is tied to the custom model config in the previous sections, we used CustomRunConfig.load("path/to/yaml/file") to load configuration from file. Otherwise, if you’re not creating any subclasses, simply run RunConfig.load("path") or ParallelConfig.load("path").

Dictionary based#

Another alternative is similar to the file-based method, but it’s defining configurations in a dictionary instead of a yaml file, and then the dictionary will be passed (as keyword arguments) to the running configuration at initialization

configuration = {
    "experiment_dir": "/tmp/dir",
    "train_config": {
        "dataset": "test",
        "batch_size": 128,
        "epochs": 2,
        "optimizer_config":{
            "name": "sgd",
            "arguments": {
                "lr": 0.1
            }
        },
        "scheduler_config":{
            "name": "cycle",
            "arguments":{
                "max_lr": 0.5,
                "total_steps": 50
            }
        }
    },
    "model_config": {
        "inp_size": 50,
        "hidden_dim": 100,
        "activation": "relu",
        "dropout": 0.15
    },
    "verbose": "silent",
    "device": "cpu"
}

config = CustomRunConfig(
    **configuration
)

Conclusion#

Now that you know how to configure experiments, you can start creating your own prototype. In the next chapter, we will learn how to write a prototype model, define necessary configurations, and launch the experiment.