Configuration basics#
Ablator embraces a versatile configuration system that configures every facet of the training process of machine learning models, covering not only the model architecture but also the training environment.
Think of this configuration system as a blueprint for crafting experiments. By leveraging this system, Ablator orchestrates the creation and setup of experiments, seamlessly integrating the necessary configurations. Furthermore, Ablator offers the flexibility to dynamically construct a hierarchical configuration through composition.
You have the choice to override settings either by using Python named arguments, by yaml configuration files, or by using dictionaries. In this tutorial, we will explain all configuration-related concepts in Ablator. We will also demonstrate all necessary steps to configure an experiment in Ablator (named arguments method). Delve into this section of this tutorial to gain insights into implementing the latter two approaches.
Configuration categories#
In ablator, configurations are divided into different categories, these include:
Model configuration (or model config).
Training configuration (or training config/ train config).
Optimizer and Scheduler configuration (or optimizer config and scheduler config).
Running configurations (or run configuration/ running config/ run config), either for training a single prototype model or training multiple models in parallel.
These configuration classes will be used together to configure an experiment.
Model Configuration#
Model configuration is required when creating the run configuration (RunConfig.model_config and ParallelConfig.model_config). This configuration class is used to define hyperparameters specific to the model of interest. By default, it does not have any attribute, instead, we typically inherit from this and add custom attributes for each of the hyperparameters specific to our models. Later, you will use this
configuration to construct the model.
There are 2 steps that are required after defining a model config class for your model:
Pass the model config to the main model’s constructor so you can construct the model using the attributes that’s defined in the config.
Create a custom running configuration class and update
model_configclass type to the newly created model config class.
Note
Ablator requires the model’s forward function to return two objects: one dictionary of model’s batched output (e.g. labels, predictions, logits, probabilities, etc.), and the other is the loss value. Notice that these values must be tensors. You also have the choice to return None for either of the values, depending on the use case.
When writing forward method of the main model, ablator requires that you return 2 outputs in the following order:
dictionary of model’s batched output (e.g. labels, predictions, logits, probabilities, etc.), e.g.
{"y_pred": <model prediction>, "y_true": <true labels>}.Loss value. Here the loss value will be considered an auxiliary metric that will be recorded for later analysis (e.g. tracking loss with
Tensorboard).
In addition, you can create a search space over these parameters for your ablation experiment. A sample use case for this is when you want to test different values for model size, number of layers, activation functions, etc. You can do this by creating a search space via SearchSpace class for the hyperparameters that you have defined in the model config. Refer to the Ablation experiment tutorial for more details.
Training Configuration#
This configuration class defines the training setting (e.g., batch size, number of epochs, the optimizer to use, etc.). Two important attributes to metion are optimizer_config and scheduler_config. As the names suggest, they configure the optimizer and scheduler to be used in the training process.
Parameter |
Usage |
|---|---|
dataset (``str``) |
dataset name. maybe used in custom dataset loader functions. |
batch_size (``int``) |
batch size. |
epochs (``int``) |
number of epochs to train. |
optimizer_config (``OptimizerConfig``) |
optimizer configuration. |
scheduler_config (``Optional[SchedulerConfig]``) |
scheduler configuration. |
Training configuration is required when creating the run configuration (RunConfig.train_config or ParallelConfig.train_config)
Optimizer and Scheduler Configurations#
By default, Abaltor takes care of creating the optimizer (and optionally the scheduler) for training models. Thus, you also need to configure them so ablator knows which optimizer to pick.
OptimizerConfig is used to configure the optimizer for the training process. Currently, we support SGD optimizer, Adam optimizer, and AdamW optimizer.
SchedulerConfig, on the other hand, can be used to configure the learning rate scheduler for the training process. Currently, we support StepLR scheduler, OneCycleLR scheduler, and ReduceLROnPlateau scheduler.
Both of these config classes have similar arguments:
Parameter |
Usage |
|---|---|
name (``str``) |
The type of the optimizer or scheduler, this can be any in |
arguments (``OptimizerArgs``) |
The arguments for the scheduler or optimizer, specific to a certain type of scheduler or scheduler. For opimizer, MUST include an item for learning rate, e.g. |
The table below shows possible arguments can be defined for each type of optimzer:
Optimizer type |
Arguments |
|---|---|
sgd |
|
adam |
|
adamw |
|
The table below shows possible arguments can be defined for each type of scheduler:
Sc heduler type |
Arguments |
|---|---|
cycle |
|
plataeu |
|
step |
|
Running Configurations#
Running configurations define the environment of an experiment (experiment main directory, number of checkpoints to maintain, hardware device to use, etc.). There are 2 types of running configurations:
RunConfigfor prototype experimentsParallelConfigfor ablation experiments
RunConfig for prototype experiments#
The table below summarizes the parameters:
Parameter |
Usage |
|---|---|
experiment_dir ( ``Stateless[Optional[str]]``, defaults to None) |
location to store experiment artifacts. |
random_seed (``Optional[int]``, defaults to None) |
random seed. |
train_config (``TrainConfig``) |
training configuration. (check |
model_config (``ModelConfig``) |
model configuration. (check |
keep_n_checkpoints (``Stateless[int]``, defaults to 3) |
number of latest checkpoints to keep. |
tensorboard (``Stateless[bool]``, defaults to True) |
whether to use tensorboardLogger. |
amp (``Stateless[bool]``, defaults to True) |
whether to use automatic mixed precision when running on gpu. |
device (``Stateless[str]``, defaults to “cuda”) |
device to run on. |
verbose (``Stateless[Literal[“consol e”, “progress”, “silent”]]``, defaults to “console”) |
verbosity level. |
eval_subsample (``Stateless[float]``, defaults to 1) |
fraction of the dataset to use for evaluation. |
metrics_n_batches (``Stateless[int]``, defaults to 32) |
max number of batches stored in every tag(train, eval, test) for evaluation. |
metrics_mb_limit (``Stateless[int]``, defaults to 10_000) |
max number of megabytes stored in every tag(train, eval, test) for evaluation. |
early_stopping_iter ( ``Stateless[Optional[int]]``, defaults to None) |
The maximum allowed difference between the current iteration and the last iteration with the best metric before applying early stopping. Early stopping will be triggered if the difference |
eval_epoch (``Stateless[float]``, defaults to 1) |
The epoch interval between two evaluations. |
log_epoch (``Stateless[float]``, defaults to 1) |
The epoch interval between two logging. |
init_chkpt ( ``Stateless[Optional[str]]``, defaults to None) |
path to a checkpoint to initialize the model with. |
warm_up_epochs (``Stateless[float]``, defaults to 1) |
number of epochs marked as warm up epochs. |
divergence_factor (`` Stateless[Optional[float]]``, defaults to 10) |
if |
optim_metrics: (``Statele ss[Optional[Dict[Optim]]]``) |
The optimization metric to use for meta-training procedures, such as for model saving and lr scheduling e.g. |
optim_metric_name: (` `Stateless[Optional[str]]``) |
The name of the metric to be optimized. |
ParallelConfig for ablation experiments#
ParallelConfig is a subclass of RunConfig. Therefore, it has all attributes RunConfig has. Additionally, it introduces other attributes to configure the parallel experiment:
Parameters |
Usage |
|---|---|
total_trials: (``Optional[int]``) |
total number of trials. |
concurrent_trials: (``Stateless[Optional[int]]``) |
number of trials to run concurrently. |
search_space: (``Dict[SearchSpace]``) |
search space for hyperparameter search, eg. |
gpu_mb_per_experiment: (``Stateless[Optional[int]]``) |
CUDA memory requirement per experimental trial in MB. e.g. a value of 100 is equivalent to 100MB |
search_algo: (``Stateless[SearchAlgo]``, defaults to SearchAlgo.random) |
type of search algorithm, SearchAlgo.random for ablation studies, SearchAlgo.tpe for HPO. |
ignore_invalid_params: (``Stateless[bool]``, defaults to False) |
whether to ignore invalid parameters when sampling or raise an error. |
remote_config: (``Stateless[Optional[RemoteConfig]]``, defaults to None) |
remote storage configuration. |
search_space is used to define a set of continuous or categorical/ discrete values for a certain hyperparameter. Refer to Search Space basics to learn more about how to use it.
Configure your experiments#
Now let’s combine everything to configure your experiment!
Note
For predefined config classes in ablator, the tables above summarize the list of attributes for each config class that you can include when creating config objects, you can also inspect these in the modules documentation Configuration module, specifically in their attribute sections.
[ ]:
try:
import ablator
except:
!pip install ablator
print("Stopping RUNTIME! Please run again") # This script automatically restart runtime (if ablator is not found and installing is needed) so changes are applied
import os
os.kill(os.getpid(), 9)
In most cases, also as a good practice, we first configure our model (or not configure it at all if you’re not running ablation study on the model architecture).
In this example, we create a configuration class MyModelConfig for a simple 1-layer neural network model with the following hyperparameters: input size (to be inferred); hidden layer dimension, activation function, and dropout rate (all of which are stateful - discussed below). This configuration then will be used to construct the neural network model MyCustomModel:
[ ]:
from ablator import RunConfig, ModelConfig, Derived, configclass
import torch.nn as nn
import torch
@configclass
class MyModelConfig(ModelConfig):
inp_size: Derived[int]
hidden_dim: int
activation: str
dropout: float
# Construct the model using the configuration
class MyCustomModel(nn.Module):
def __init__(self, config: MyModelConfig) -> None:
super().__init__()
self.linear = nn.Linear(config.inp_size, config.hidden_dim)
self.dropout = nn.Dropout(config.dropout)
if config.activation == "relu":
self.activate = nn.ReLU()
elif config.activation == "elu":
self.activate = nn.ELU()
self.criterion = nn.CrossEntropyLoss()
def forward(self, x: torch.Tensor, labels: None):
out = self.linear(x)
out = self.dropout(out)
out = self.activate(out)
loss = self.criterion(out, labels)
return {"preds": out, "labels": labels}, loss
my_model_config = MyModelConfig(hidden_dim=100, activation="relu", dropout=0.3)
Notice how we’re returning a dictionary for model’s predictions and labels, and loss value in the forward method.
We next create a training configuration, which requires an optimizer config and an optional scheduler config. Here we create optimizer_config that for an SGD optimizer, and scheduler_config that configs a OneCycleLR scheduler, and then use them in train_config:
[2]:
from ablator import OptimizerConfig, SchedulerConfig
from ablator import TrainConfig
optimizer_config = OptimizerConfig(name="sgd", arguments={"lr": 0.1})
scheduler_config = SchedulerConfig(name="cycle", arguments={"max_lr": 0.5, "total_steps": 50})
train_config = TrainConfig(
dataset="test",
batch_size=128,
epochs=2,
optimizer_config=optimizer_config,
scheduler_config=scheduler_config,
)
The last step is to create a run_config object. This object combines train_config and my_model_config, along with runtime settings like verbosity and device. However, we first need to redefine the run config class to update its model_config attribute from ModelConfig (by default) to MyModelConfig:
[4]:
@configclass
class CustomRunConfig(RunConfig):
model_config: MyModelConfig
run_config = CustomRunConfig(
train_config=train_config,
model_config=my_model_config,
verbose="silent",
device="cpu",
)
run_config
[4]:
CustomRunConfig(model_config={'inp_size': None, 'hidden_dim': 100, 'activation': 'relu', 'dropout': 0.3}, experiment_dir=None, random_seed=None, train_config={'dataset': 'test', 'batch_size': 128, 'epochs': 2, 'optimizer_config': {'name': 'sgd', 'arguments': {'weight_decay': 0.0, 'momentum': 0.0, 'lr': 0.1}}, 'scheduler_config': {'name': 'cycle', 'arguments': {'max_lr': 0.5, 'total_steps': 50, 'step_when': 'train'}}}, keep_n_checkpoints=3, tensorboard=True, amp=True, device='cpu', verbose='silent', eval_subsample=1.0, metrics_n_batches=32, metrics_mb_limit=10000, early_stopping_iter=None, eval_epoch=1.0, log_epoch=1.0, init_chkpt=None, warm_up_epochs=1.0, divergence_factor=10.0, optim_metrics=None, optim_metric_name=None)
That’s it, we have finished configuring our experiment! With this, we are half-way to launching an ablation experiment. Refer to Prototyping models and Ablation experiment tutorials for the next steps after configuration to launch the experiment.
Note
All configuration classes (including custom ones that you may create) must inherit from ConfigBase and decorated with @configclass decorator, you can see this in MyModelConfig and CustomRunConfig classes in the example above.
Ablator custom data types for stateful experiment design#
One key feature of Ablator is the ability to run stateful experiments. To do this, we created three special types: Stateless, Stateful, and Derived. These are custom annotations to define configuration attributes to which the experiment state is agnostic, aka does not have any impact on the experiment state (which can be Complete, Running, Pending, Pruned, etc. Read more about experiment state from our paper).
Stateless attributes can take different values between trials or experiments. For example, learning rate should be stateless, as we can train models with different learning rates. Note that if you’re declaring a variable to be Stateless, it must be assigned an initial value before launching the experiment.
Stateful attributes, opposite to Stateless, must have the same value between different experiments. For example, a binary classification model should always have output size of 2. Stateful variables, defined as a primitive datatype (no annotating needed), must be assigned with values before launching the experiment.
Derived attributes are Stateful and are un-decided at the start of the experiment. Their values are determined by internal experiment processes that can depend on other experimental attributes, e.g model input size that depends on the dataset.
Later when annotating attributes, you can wrap these keywords around their data type. For example, inp_size: Derived[int] means that inp_size is a derived attribute of type int, and similarly for Stateless. For Stateful however, any data types that are not annotated with Derived nor Stateless are considered Stateful.
We also defined other structural data types such as List, Dict, Enum, etc. You can find more information about all custom data types (including the three above types) in the Data types configuration module documentation.
Note
The reason for creating these annotations is that for stateful experiment design, the configuration should be unambiguous at the initialization state. And the use of these annotations assures the unambiguity of the configuration.
If you are interested to learn more about stateful experiment design, see our paper: ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments
Alternatives to constructing configuration objects#
There are three methods to configure an experiment: named arguments, file-based, and dictionary-based. All previous code snippets are examples of the named-arguments method. Now let’s look at how file based method and dictionary based method work.
File-based#
File based configuration is a way for you to create simple configuration files. You can use <ConfigClass>.load(path/to/yaml/file) method to create configuration with values provided in the config file.
To write these config files, simply follow yaml syntax. Make sure that the attributes and their hierarchy match with those in the config classes (for both default config classes from ablator, or custom ones like MyModelConfig). The following example shows what a config yaml file looks like. We will name it config.yaml:
experiment_dir: "/tmp/dir"
train_config:
dataset: test
batch_size: 128
epochs: 2
optimizer_config:
name: sgd
arguments:
lr: 0.1
scheduler_config:
name: cycle
arguments:
max_lr: 0.5
total_steps: 50
model_config:
inp_size: 50
hidden_dim: 100
activation: "relu"
dropout: 0.15
verbose: "silent"
device: "cpu"
Now in your code, load these values to create the config object:
config = CustomRunConfig.load("path/to/yaml/file")
Note that since we created a custom running configuration class CustomRunConfig that is tied to the custom model config in the previous sections, we used CustomRunConfig.load("path/to/yaml/file") to load configuration from file. Otherwise, if you’re not creating any subclasses, simply run RunConfig.load("path") or ParallelConfig.load("path").
Dictionary based#
Another alternative is similar to the file-based method, but it’s defining configurations in a dictionary instead of a yaml file, and then the dictionary will be passed (as keyword arguments) to the configuration at initialization
configuration = {
"experiment_dir": "/tmp/dir",
"train_config": {
"dataset": "test",
"batch_size": 128,
"epochs": 2,
"optimizer_config":{
"name": "sgd",
"arguments": {
"lr": 0.1
}
},
"scheduler_config":{
"name": "cycle",
"arguments":{
"max_lr": 0.5,
"total_steps": 50
}
}
},
"model_config": {
"inp_size": 50,
"hidden_dim": 100,
"activation": "relu",
"dropout": 0.15
},
"verbose": "silent",
"device": "cpu"
}
config = CustomRunConfig(
**configuration
)
Conclusion#
Now that you’ve learned how to configure experiments, you can start creating your own prototype. In the next chapter, we will learn how to write a prototype model, define necessary configurations and model interfaces and launch the experiment.