Configuration basics#

Ablator embraces a versatile configuration system that configures every facet of the training process of machine learning models, covering not only the model architecture but also the training environment.

Think of this configuration system as a blueprint for crafting experiments. By leveraging this system, Ablator orchestrates the creation and setup of experiments, seamlessly integrating the necessary configurations. Furthermore, Ablator offers the flexibility to dynamically construct a hierarchical configuration through composition.

You have the choice to override settings either by using Python named arguments, by yaml configuration files, or by using dictionaries. In this tutorial, we will explain all configuration-related concepts in Ablator. We will also demonstrate all necessary steps to configure an experiment in Ablator (named arguments method). Delve into this section of this tutorial to gain insights into implementing the latter two approaches.

Configuration categories#

In ablator, configurations are divided into different categories, these include:

Model configuration (or model config).
Training configuration (or training config/ train config).
Optimizer and Scheduler configuration (or optimizer config and scheduler config).
Running configurations (or run configuration/ running config/ run config), either for training a single prototype model or training multiple models in parallel.

These configuration classes will be used together to configure an experiment.

Model Configuration#

Model configuration is required when creating the run configuration (RunConfig.model_config and ParallelConfig.model_config). This configuration class is used to define hyperparameters specific to the model of interest. By default, it does not have any attribute, instead, we typically inherit from this and add custom attributes for each of the hyperparameters specific to our models. Later, you will use this configuration to construct the model.

There are 2 steps that are required after defining a model config class for your model:

Pass the model config to the main model’s constructor so you can construct the model using the attributes that’s defined in the config.
Create a custom running configuration class and update model_config class type to the newly created model config class.

Note

Ablator requires the model’s forward function to return two objects: one dictionary of model’s batched output (e.g. labels, predictions, logits, probabilities, etc.), and the other is the loss value. Notice that these values must be tensors. You also have the choice to return None for either of the values, depending on the use case.

When writing forward method of the main model, ablator requires that you return 2 outputs in the following order:

dictionary of model’s batched output (e.g. labels, predictions, logits, probabilities, etc.), e.g. {"y_pred": <model prediction>, "y_true": <true labels>}.
Loss value. Here the loss value will be considered an auxiliary metric that will be recorded for later analysis (e.g. tracking loss with Tensorboard).

In addition, you can create a search space over these parameters for your ablation experiment. A sample use case for this is when you want to test different values for model size, number of layers, activation functions, etc. You can do this by creating a search space via SearchSpace class for the hyperparameters that you have defined in the model config. Refer to the Ablation experiment tutorial for more details.

Training Configuration#

This configuration class defines the training setting (e.g., batch size, number of epochs, the optimizer to use, etc.). Two important attributes to metion are optimizer_config and scheduler_config. As the names suggest, they configure the optimizer and scheduler to be used in the training process.

Parameter	Usage
dataset (``str``)	dataset name. maybe used in custom dataset loader functions.
batch_size (``int``)	batch size.
epochs (``int``)	number of epochs to train.
optimizer_config (``OptimizerConfig``)	optimizer configuration.
scheduler_config (``Optional[SchedulerConfig]``)	scheduler configuration.

Training configuration is required when creating the run configuration (RunConfig.train_config or ParallelConfig.train_config)

Optimizer and Scheduler Configurations#

By default, Abaltor takes care of creating the optimizer (and optionally the scheduler) for training models. Thus, you also need to configure them so ablator knows which optimizer to pick.

OptimizerConfig is used to configure the optimizer for the training process. Currently, we support SGD optimizer, Adam optimizer, and AdamW optimizer.

SchedulerConfig, on the other hand, can be used to configure the learning rate scheduler for the training process. Currently, we support StepLR scheduler, OneCycleLR scheduler, and ReduceLROnPlateau scheduler.

Both of these config classes have similar arguments:

Parameter	Usage
name (``str``)	The type of the optimizer or scheduler, this can be any in `['sgd', 'adam', 'adamw']` for optimizers and in `['none', 'step', 'cycle', 'plateau']` for schedulers.
arguments (``OptimizerArgs``)	The arguments for the scheduler or optimizer, specific to a certain type of scheduler or scheduler. For opimizer, MUST include an item for learning rate, e.g. `{"lr": 0.1}`

The table below shows possible arguments can be defined for each type of optimzer:

Optimizer type	Arguments
sgd	`weight_decay` (defaults to 0.0): Weight decay rate`momentum` (defaults to 0.0): Momentum factor
adam	`betas` (defaults to (0.9, 0.999)): Coefficients for computing running averages of gradient and its square.`weight_decay` (defaults to 0.0): Weight decay rate.
adamw	`betas` (defaults to (0.9, 0.999)): Coefficients for computing running averages of gradient and its square.`eps` (defaults to 1e-8): Term added to the denominator to improve numerical stability.`weight_decay` (defaults to 0.01): Weight decay rate.

The table below shows possible arguments can be defined for each type of scheduler:

Sc heduler type	Arguments
cycle	`max_lr` : Upper learning rate boundaries in the cycle. `total_steps` : The total number of steps to run the scheduler in a cycle. `step_when` (defaults to ``”train”``): The step type at which the `scheduler.step()` should be invoked: `'train'`, `'val'`, or `'epoch'`.
plataeu	`patience` (defaults to 10): Number of epochs with no improvement after which learning rate will be reduced. `min_lr` (defaults to 1e-5): A lower bound on the learning rate. `mode` (defaults to “min”): One of `'min'`, `'max'`, or `'auto'`, which defines the direction of optimization, so as to adjust the learning rate accordingly, i.e when a certain metric ceases improving. `factor` (defaults to 0.0): Factor by which the learning rate will be reduced: `new_lr = lr * factor`. `threshold` (defaults to 1e-4): Threshold for measuring the new optimum, to only focus on significant changes. `verbose` (defaults to False): If `True`, prints a message to `stdout` for each update. `step_when` (defaults to “val”): The step type at which the scheduler should be invoked: `'train'`, `'val'`, or `'epoch'`.
step	`step_size` (defaults to 1): Period of learning rate decay. `gamma` (defaults to 0.99): Multiplicative factor of learning rate decay.99. `step_when` (defaults to “epoch”): The step type at which the scheduler should be invoked: `'train'`, `'val'`, or `'epoch'`.

Running Configurations#

Running configurations define the environment of an experiment (experiment main directory, number of checkpoints to maintain, hardware device to use, etc.). There are 2 types of running configurations:

RunConfig for prototype experiments
ParallelConfig for ablation experiments

`RunConfig` for prototype experiments#

The table below summarizes the parameters:

Parameter	Usage
experiment_dir ( ``Stateless[Optional[str]]``, defaults to None)	location to store experiment artifacts.
random_seed (``Optional[int]``, defaults to None)	random seed.
train_config (``TrainConfig``)	training configuration. (check `TrainConfig` for more details)
model_config (``ModelConfig``)	model configuration. (check `ModelConfig` for more details)
keep_n_checkpoints (``Stateless[int]``, defaults to 3)	number of latest checkpoints to keep.
tensorboard (``Stateless[bool]``, defaults to True)	whether to use tensorboardLogger.
amp (``Stateless[bool]``, defaults to True)	whether to use automatic mixed precision when running on gpu.
device (``Stateless[str]``, defaults to “cuda”)	device to run on.
verbose (``Stateless[Literal[“consol e”, “progress”, “silent”]]``, defaults to “console”)	verbosity level.
eval_subsample (``Stateless[float]``, defaults to 1)	fraction of the dataset to use for evaluation.
metrics_n_batches (``Stateless[int]``, defaults to 32)	max number of batches stored in every tag(train, eval, test) for evaluation.
metrics_mb_limit (``Stateless[int]``, defaults to 10_000)	max number of megabytes stored in every tag(train, eval, test) for evaluation.
early_stopping_iter ( ``Stateless[Optional[int]]``, defaults to None)	The maximum allowed difference between the current iteration and the last iteration with the best metric before applying early stopping. Early stopping will be triggered if the difference `(current_itr-best_itr)` exceeds `early_stopping_iter`.If set to `None`, early stopping will not be applied.
eval_epoch (``Stateless[float]``, defaults to 1)	The epoch interval between two evaluations.
log_epoch (``Stateless[float]``, defaults to 1)	The epoch interval between two logging.
init_chkpt ( ``Stateless[Optional[str]]``, defaults to None)	path to a checkpoint to initialize the model with.
warm_up_epochs (``Stateless[float]``, defaults to 1)	number of epochs marked as warm up epochs.
divergence_factor (`` Stateless[Optional[float]]``, defaults to 10)	if `cur_loss > best_loss > divergence_factor`, the model is considered to have diverged.
optim_metrics: (``Statele ss[Optional[Dict[Optim]]]``)	The optimization metric to use for meta-training procedures, such as for model saving and lr scheduling e.g. `{"val_loss": "min"}`
optim_metric_name: (` `Stateless[Optional[str]]``)	The name of the metric to be optimized.

`ParallelConfig` for ablation experiments#

ParallelConfig is a subclass of RunConfig. Therefore, it has all attributes RunConfig has. Additionally, it introduces other attributes to configure the parallel experiment:

Parameters	Usage
total_trials: (``Optional[int]``)	total number of trials.
concurrent_trials: (``Stateless[Optional[int]]``)	number of trials to run concurrently.
search_space: (``Dict[SearchSpace]``)	search space for hyperparameter search, eg. `{"train_config.optimizer_config.arguments.lr": SearchSpace(value_range=[0, 10], value_type="int"),}`
gpu_mb_per_experiment: (``Stateless[Optional[int]]``)	CUDA memory requirement per experimental trial in MB. e.g. a value of 100 is equivalent to 100MB
search_algo: (``Stateless[SearchAlgo]``, defaults to SearchAlgo.random)	type of search algorithm, SearchAlgo.random for ablation studies, SearchAlgo.tpe for HPO.
ignore_invalid_params: (``Stateless[bool]``, defaults to False)	whether to ignore invalid parameters when sampling or raise an error.
remote_config: (``Stateless[Optional[RemoteConfig]]``, defaults to None)	remote storage configuration.

search_space is used to define a set of continuous or categorical/ discrete values for a certain hyperparameter. Refer to Search Space basics to learn more about how to use it.

Configure your experiments#

Now let’s combine everything to configure your experiment!

Note

For predefined config classes in ablator, the tables above summarize the list of attributes for each config class that you can include when creating config objects, you can also inspect these in the modules documentation Configuration module, specifically in their attribute sections.

[ ]:

try:
    import ablator
except:
    !pip install ablator
    print("Stopping RUNTIME! Please run again") # This script automatically restart runtime (if ablator is not found and installing is needed) so changes are applied
    import os

    os.kill(os.getpid(), 9)

In most cases, also as a good practice, we first configure our model (or not configure it at all if you’re not running ablation study on the model architecture).

In this example, we create a configuration class MyModelConfig for a simple 1-layer neural network model with the following hyperparameters: input size (to be inferred); hidden layer dimension, activation function, and dropout rate (all of which are stateful - discussed below). This configuration then will be used to construct the neural network model MyCustomModel:

[ ]:

from ablator import RunConfig, ModelConfig, Derived, configclass

import torch.nn as nn
import torch

@configclass
class MyModelConfig(ModelConfig):
    inp_size: Derived[int]
    hidden_dim: int
    activation: str
    dropout: float

# Construct the model using the configuration
class MyCustomModel(nn.Module):
    def __init__(self, config: MyModelConfig) -> None:
        super().__init__()
        self.linear = nn.Linear(config.inp_size, config.hidden_dim)
        self.dropout = nn.Dropout(config.dropout)
        if config.activation == "relu":
            self.activate = nn.ReLU()
        elif config.activation == "elu":
            self.activate = nn.ELU()
        self.criterion = nn.CrossEntropyLoss()

    def forward(self, x: torch.Tensor, labels: None):
        out = self.linear(x)
        out = self.dropout(out)
        out = self.activate(out)

        loss = self.criterion(out, labels)

        return {"preds": out, "labels": labels}, loss

my_model_config = MyModelConfig(hidden_dim=100, activation="relu", dropout=0.3)

Notice how we’re returning a dictionary for model’s predictions and labels, and loss value in the forward method.

We next create a training configuration, which requires an optimizer config and an optional scheduler config. Here we create optimizer_config that for an SGD optimizer, and scheduler_config that configs a OneCycleLR scheduler, and then use them in train_config:

[2]:

from ablator import OptimizerConfig, SchedulerConfig
from ablator import TrainConfig

optimizer_config = OptimizerConfig(name="sgd", arguments={"lr": 0.1})
scheduler_config = SchedulerConfig(name="cycle", arguments={"max_lr": 0.5, "total_steps": 50})

train_config = TrainConfig(
    dataset="test",
    batch_size=128,
    epochs=2,
    optimizer_config=optimizer_config,
    scheduler_config=scheduler_config,
)

The last step is to create a run_config object. This object combines train_config and my_model_config, along with runtime settings like verbosity and device. However, we first need to redefine the run config class to update its model_config attribute from ModelConfig (by default) to MyModelConfig:

[4]:

@configclass
class CustomRunConfig(RunConfig):
    model_config: MyModelConfig

run_config = CustomRunConfig(
    train_config=train_config,
    model_config=my_model_config,
    verbose="silent",
    device="cpu",
)
run_config

[4]:

CustomRunConfig(model_config={'inp_size': None, 'hidden_dim': 100, 'activation': 'relu', 'dropout': 0.3}, experiment_dir=None, random_seed=None, train_config={'dataset': 'test', 'batch_size': 128, 'epochs': 2, 'optimizer_config': {'name': 'sgd', 'arguments': {'weight_decay': 0.0, 'momentum': 0.0, 'lr': 0.1}}, 'scheduler_config': {'name': 'cycle', 'arguments': {'max_lr': 0.5, 'total_steps': 50, 'step_when': 'train'}}}, keep_n_checkpoints=3, tensorboard=True, amp=True, device='cpu', verbose='silent', eval_subsample=1.0, metrics_n_batches=32, metrics_mb_limit=10000, early_stopping_iter=None, eval_epoch=1.0, log_epoch=1.0, init_chkpt=None, warm_up_epochs=1.0, divergence_factor=10.0, optim_metrics=None, optim_metric_name=None)

That’s it, we have finished configuring our experiment! With this, we are half-way to launching an ablation experiment. Refer to Prototyping models and Ablation experiment tutorials for the next steps after configuration to launch the experiment.

Note

All configuration classes (including custom ones that you may create) must inherit from ConfigBase and decorated with @configclass decorator, you can see this in MyModelConfig and CustomRunConfig classes in the example above.

Ablator custom data types for stateful experiment design#

One key feature of Ablator is the ability to run stateful experiments. To do this, we created three special types: Stateless, Stateful, and Derived. These are custom annotations to define configuration attributes to which the experiment state is agnostic, aka does not have any impact on the experiment state (which can be Complete, Running, Pending, Pruned, etc. Read more about experiment state from our paper).

Stateless attributes can take different values between trials or experiments. For example, learning rate should be stateless, as we can train models with different learning rates. Note that if you’re declaring a variable to be Stateless, it must be assigned an initial value before launching the experiment.
Stateful attributes, opposite to Stateless, must have the same value between different experiments. For example, a binary classification model should always have output size of 2. Stateful variables, defined as a primitive datatype (no annotating needed), must be assigned with values before launching the experiment.
Derived attributes are Stateful and are un-decided at the start of the experiment. Their values are determined by internal experiment processes that can depend on other experimental attributes, e.g model input size that depends on the dataset.

Later when annotating attributes, you can wrap these keywords around their data type. For example, inp_size: Derived[int] means that inp_size is a derived attribute of type int, and similarly for Stateless. For Stateful however, any data types that are not annotated with Derived nor Stateless are considered Stateful.

We also defined other structural data types such as List, Dict, Enum, etc. You can find more information about all custom data types (including the three above types) in the Data types configuration module documentation.

Note

The reason for creating these annotations is that for stateful experiment design, the configuration should be unambiguous at the initialization state. And the use of these annotations assures the unambiguity of the configuration.
If you are interested to learn more about stateful experiment design, see our paper: ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments

Alternatives to constructing configuration objects#

There are three methods to configure an experiment: named arguments, file-based, and dictionary-based. All previous code snippets are examples of the named-arguments method. Now let’s look at how file based method and dictionary based method work.

File-based#

File based configuration is a way for you to create simple configuration files. You can use <ConfigClass>.load(path/to/yaml/file) method to create configuration with values provided in the config file.

To write these config files, simply follow yaml syntax. Make sure that the attributes and their hierarchy match with those in the config classes (for both default config classes from ablator, or custom ones like MyModelConfig). The following example shows what a config yaml file looks like. We will name it config.yaml:

experiment_dir: "/tmp/dir"
train_config:
  dataset: test
  batch_size: 128
  epochs: 2
  optimizer_config:
    name: sgd
    arguments:
      lr: 0.1
  scheduler_config:
    name: cycle
    arguments:
      max_lr: 0.5
      total_steps: 50
model_config:
  inp_size: 50
  hidden_dim: 100
  activation: "relu"
  dropout: 0.15
verbose: "silent"
device: "cpu"

Now in your code, load these values to create the config object:

config = CustomRunConfig.load("path/to/yaml/file")

Note that since we created a custom running configuration class CustomRunConfig that is tied to the custom model config in the previous sections, we used CustomRunConfig.load("path/to/yaml/file") to load configuration from file. Otherwise, if you’re not creating any subclasses, simply run RunConfig.load("path") or ParallelConfig.load("path").

Dictionary based#

Another alternative is similar to the file-based method, but it’s defining configurations in a dictionary instead of a yaml file, and then the dictionary will be passed (as keyword arguments) to the configuration at initialization

configuration = {
    "experiment_dir": "/tmp/dir",
    "train_config": {
        "dataset": "test",
        "batch_size": 128,
        "epochs": 2,
        "optimizer_config":{
            "name": "sgd",
            "arguments": {
                "lr": 0.1
            }
        },
        "scheduler_config":{
            "name": "cycle",
            "arguments":{
                "max_lr": 0.5,
                "total_steps": 50
            }
        }
    },
    "model_config": {
        "inp_size": 50,
        "hidden_dim": 100,
        "activation": "relu",
        "dropout": 0.15
    },
    "verbose": "silent",
    "device": "cpu"
}

config = CustomRunConfig(
    **configuration
)

Conclusion#

Now that you’ve learned how to configure experiments, you can start creating your own prototype. In the next chapter, we will learn how to write a prototype model, define necessary configurations and model interfaces and launch the experiment.

Configuration basics#

Configuration categories#

Model Configuration#

Training Configuration#

Optimizer and Scheduler Configurations#

Running Configurations#

RunConfig for prototype experiments#

ParallelConfig for ablation experiments#

Configure your experiments#

Ablator custom data types for stateful experiment design#

Alternatives to constructing configuration objects#

File-based#

Dictionary based#

Conclusion#

`RunConfig` for prototype experiments#

`ParallelConfig` for ablation experiments#