{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Configuration basics" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Ablator embraces a versatile configuration system that configures every facet of training machine learning models, covering not only the model architecture but also the training environment.\n", "\n", "Think of this configuration system as a blueprint for crafting experiments. By leveraging this system, Ablator orchestrates the creation and setup of experiments, seamlessly integrating the necessary configurations. \n", "\n", "Furthermore, Ablator offers the flexibility to dynamically construct a hierarchical configuration through composition. You have the choice to override settings either via `yaml` configuration files and command-line inputs, or to directly use Python objects and classes. Explore [these illustrative examples](./GettingStarted-more-demos.ipynb) or delve into [this section](#alternatives-to-constructing-configuration-objects) of this tutorial to gain insights into implementing these two approaches.\n", "\n", "In this tutorial, we will explain all configuration-related concepts in Ablator. We will also demonstrate all necessary steps to configure an experiment in Ablator." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Configuration categories\n", "\n", "In ablator, configurations are organized in different categories, these include:\n", "\n", "- [Model configuration](#model-configuration).\n", "\n", "- [Training configuration](#training-configuration).\n", "\n", "- [Optimizer and Scheduler configuration](#optimizer-and-scheduler-configurations).\n", "\n", "- [Running configurations](#running-configurations) (either for training a single prototype model or training multiple models in parallel).\n", "\n", "These configuration classes will be used together to configure an experiment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Configuration" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This configuration class is used to define [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) specific to the model of interest. Later, you can use this configuration to construct the model.\n", "\n", "There are 2 steps that are required after defining a model config class for your model:\n", "\n", "- Pass the model config to the main model's constructor so you can construct the model using the attributes that's defined in the config.\n", "\n", "- Create a custom [running configuration class](#running-configurations) and update `model_config` class type to the newly created model config class.\n", "\n", "
\n", "\n", "Note\n", "\n", "It is required to return the outputs (which must be in a dictionary format. Example: `{\"y_pred\": , \"y_true\": }`) and loss in the forward method of the main model. Here the loss value will be considered an auxiliary metric that will be recorded for later analysis.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Model configuration is required when creating the run configuration (`RunConfig.model_config` and `ParallelConfig.model_config`).\n", "\n", "In addition, you can create a search space over these parameters and then use ablator to run Hyperparameter optimization (HPO). A sample use case for this is when you want to test different values for model size, number of layers, activation functions, etc. You can do this by creating a custom model configuration class from `ModelConfig` that has these hyperparameters as its attributes and create a search space via `SearchSpace` class. Refer to the [Hyperparameter Optimization](./HPO-tutorial.ipynb) tutorial for more details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training Configuration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This configuration class defines the training setting (e.g., batch size, number of epochs, the optimizer to use, etc.). Two important attributes to metion are `optimizer_config` and `scheduler_config`. As the names suggest, they configure the optimizer and scheduler to be used in the training process.\n", "\n", "Training configuration is required when creating the run configuration (`RunConfig.train_config` or `ParallelConfig.train_config`)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "| Parameter | Usage |\n", "|-------------------|-----------------------------------------------------------------------|\n", "| dataset _(str)_ | dataset name. maybe used in custom dataset loader functions. |\n", "| batch_size _(int)_ | batch size. |\n", "| epochs _(int)_ | number of epochs to train. |\n", "| optimizer_config _(OptimizerConfig)_ | optimizer configuration. (check ``OptimizerConfig`` for more details) |\n", "| scheduler_config _(Optional[SchedulerConfig])_ | scheduler configuration. (check ``SchedulerConfig`` for more details) |\n", "| rand_weights_init _(bool, defaults to True)_ | whether to initialize model weights randomly. |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Optimizer and Scheduler Configurations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, Abaltor takes care of creating the optimizer (and optionally the scheduler) for training models. Thus, you also need to configure them.\n", "\n", "`OptimizerConfig` is used to configure the optimizer for the training process. Currently, we support `SGD` optimizer, `Adam` optimizer, and `AdamW` optimizer.\n", "\n", "`SchedulerConfig`, on the other hand, can be used to configure the learning rate scheduler for the training process. Currently, we support `StepLR` scheduler, `OneCycleLR` scheduler, and `ReduceLROnPlateau` scheduler." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Both of these config classes have similar arguments:\n", "\n", "| Parameter | Usage |\n", "|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| name _(str)_ | The type of the scheduler or optimizer, this can be any in ``['sgd', 'adam', 'adamw']`` for optimizers and
in ``['none', 'step', 'cycle', 'plateau']`` for schedulers. |\n", "| arguments _(OptimizerArgs)_ | The arguments for the scheduler or optimizer, specific to a certain type of scheduler or scheduler. For opimizer, must include an item for learning rate, e.g. `{\"lr\": 0.1}` |\n", "\n", "The table below shows how `arguments` can be defined for each type of optimzer:\n", "\n", "| Optimizer type | Arguments |\n", "|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| sgd | `weight_decay` _(defaults to 0.0)_: Weight decay rate
`momentum` _(defaults to 0.0)_: Momentum factor
|\n", "| adam | `betas` _(defaults to (0.5, 0.9))_: Coefficients for computing running averages of gradient and its square.
`weight_decay` _(defaults to 0.0)_: Weight decay rate.
|\n", "| adamw | `betas` _(defaults to (0.9, 0.999))_: Coefficients for computing running averages of gradient and its square.
`eps` _(defaults to 1e-8)_: Term added to the denominator to improve numerical stability.
`weight_decay` _(defaults to 0.0)_: Weight decay rate.
|\n", "\n", "The table below shows how `arguments` can be defined for each type of scheduler:\n", "\n", "\n", "| Scheduler type | Arguments |\n", "|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| cycle | `max_lr` : Upper learning rate boundaries in the cycle.
`total_steps` : The total number of steps to run the scheduler in a cycle.
`step_when` _(defaults to `\"train\"`)_: The step type at which the `scheduler.step()` should be invoked: ``'train'``, ``'val'``, or ``'epoch'``. |\n", "| plataeu | `patience` _(defaults to 10)_: Number of epochs with no improvement after which learning rate will be reduced.
`min_lr` _(defaults to 1e-5)_: A lower bound on the learning rate.
`mode` _(defaults to \"min\")_: One of ``'min'``, ``'max'``, or ``'auto'``, which defines the direction of optimization, so as to adjust the learning rate
accordingly, i.e when a certain metric ceases improving.
`factor` _(defaults to 0.0)_: Factor by which the learning rate will be reduced: ``new_lr = lr * factor``.
`threshold` _(defaults to 1e-4)_: Threshold for measuring the new optimum, to only focus on significant changes.
`verbose` _(defaults to False)_: If ``True``, prints a message to ``stdout`` for each update.
`step_when` _(defaults to \"val\")_: The step type at which the scheduler should be invoked: ``'train'``, ``'val'``, or ``'epoch'``.
|\n", "| step | `step_size` _(defaults to 1)_: Period of learning rate decay.
`gamma` _(defaults to 0.99)_: Multiplicative factor of learning rate decay.99.
`step_when` _(defaults to \"epoch\")_: The step type at which the scheduler should be invoked: ``'train'``, ``'val'``, or ``'epoch'``.
|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running Configurations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Running configurations define the environment of an experiment (experiment main directory, number of checkpoints to maintain, hardware device to use, etc.). There are 2 types of running configurations:\n", "\n", "- `RunConfig` for prototype experiments\n", "- `ParallelConfig` for ablation experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `RunConfig` for prototype experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The table below summarizes the parameters:\n", "\n", "| Parameter | Usage |\n", "|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| experiment_dir _(Stateless[Optional[str]], defaults to None)_ | location to store experiment artifacts. |\n", "| random_seed _(Optional[int], defaults to None)_ | random seed. |\n", "| train_config _(TrainConfig)_ | training configuration. (check ``TrainConfig`` for more details) |\n", "| model_config _(ModelConfig)_ | model configuration. (check ``ModelConfig`` for more details) |\n", "| keep_n_checkpoints _(Stateless[int], defaults to 3)_ | number of latest checkpoints to keep. |\n", "| tensorboard _(Stateless[bool], defaults to True)_ | whether to use tensorboardLogger. |\n", "| amp _(Stateless[bool], defaults to True)_ | whether to use automatic mixed precision when running on gpu. |\n", "| device _(Stateless[str], defaults to \"cuda\")_ | device to run on. |\n", "| verbose _(Stateless[Literal[\"console\", \"progress\", \"silent\"]], defaults to \"console\")_ | verbosity level. |\n", "| eval_subsample _(Stateless[float], defaults to 1)_ | fraction of the dataset to use for evaluation. |\n", "| metrics_n_batches _(Stateless[int], defaults to 32)_ | max number of batches stored in every tag(train, eval, test) for evaluation. |\n", "| metrics_mb_limit _(Stateless[int], defaults to 100)_ | max number of megabytes stored in every tag(train, eval, test) for evaluation. |\n", "| early_stopping_iter _(Stateless[Optional[int]], defaults to None)_ | The maximum allowed difference between the current iteration and the last
iteration with the best metric before applying early stopping. Early stopping
will be triggered if the difference ``(current_itr-best_itr)`` exceeds ``early_stopping_iter``.
If set to ``None``, early stopping will not be applied. |\n", "| eval_epoch _(Stateless[float], defaults to 1)_ | The epoch interval between two evaluations. |\n", "| log_epoch _(Stateless[float], defaults to 1)_ | The epoch interval between two logging. |\n", "| init_chkpt _(Stateless[Optional[str]], defaults to None)_ | path to a checkpoint to initialize the model with. |\n", "| warm_up_epochs _(Stateless[float], defaults to 1)_ | number of epochs marked as warm up epochs. |\n", "| divergence_factor _(Stateless[Optional[float]], defaults to 100)_ | if ``cur_loss > best_loss > divergence_factor``, the model is considered
to have diverged. |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `ParallelConfig` for ablation experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ParallelConfig` is a subclass of `RunConfig`. Therefore, it has all attributes RunConfig has. Additionally, it introduces other attributes to configure the parallel training process with horizontal scaling of a single experiment:\n", "\n", "| Parameters | Usage |\n", "|----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| total_trials: _(Optional[int])_ | total number of trials. |\n", "| concurrent_trials: _(int)_ | number of trials to run concurrently. |\n", "| search_space: _(Dict[SearchSpace])_ | search space for hyperparameter search, eg. ``{\"train_config.optimizer_config.arguments.lr\": SearchSpace(value_range=[0, 10], value_type=\"int\"),}`` |\n", "| optim_metrics: _(Optional[Dict[Optim]])_ | metrics to optimize, eg. ``{\"val_loss\": \"min\"}`` |\n", "| gpu_mb_per_experiment: _(int)_ | CUDA memory requirement per experimental trial in MB. e.g. a value of 100 is equivalent to 100MB |\n", "| search_algo: _(SearchAlgo, defaults to SearchAlgo.tpe)_ | type of search algorithm. |\n", "| ignore_invalid_params: _(bool, defaults to False)_ | whether to ignore invalid parameters when sampling or raise an error. |\n", "| remote_config: _(Optional[RemoteConfig], defaults to None)_ | remote storage configuration. |\n", "\n", "`search_space` is used to define a set of continuous or categorical/discrete values for a certain hyperparameter that you want to optimize. Refer to [Search Space basics](./Search-space-tutorial.ipynb) to learn more about how to use it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Configure your experiments" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now let's combine everything to configure your experiment!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In most cases, we will first configure our model (or not configure it at all if you're not running ablation study on the model architecture).\n", "\n", "In this example, we create a configuration class `MyModelConfig` for a simple 1-layer neural network model with the following hyperparameters: input size (to be inferred); hidden layer dimension, activation function, and dropout rate (all of which are stateful). This configuration then will be used to construct the neural network model `MyCustomModel`:\n", "\n", "```python\n", "from ablator import RunConfig, ModelConfig, Stateless, Derived, configclass\n", "\n", "import torch.nn as nn\n", "import torch\n", "\n", "@configclass\n", "class MyModelConfig(ModelConfig):\n", " inp_size: Derived[int]\n", " hidden_dim: int\n", " activation: str\n", " dropout: float\n", "\n", "# Construct the model using the configuration\n", "class MyCustomModel(nn.Module):\n", " def __init__(self, config: MyModelConfig) -> None:\n", " super().__init__()\n", " self.linear = nn.Linear(config.inp_size, config.hidden_dim)\n", " self.dropout = nn.Dropout(config.dropout)\n", " if config.activation == \"relu\":\n", " self.activate = nn.ReLU()\n", " elif config.activation == \"elu\":\n", " self.activate = nn.ELU()\n", " self.criterion = nn.CrossEntropyLoss()\n", "\n", " def forward(self, x: torch.Tensor, labels: None):\n", " out = self.linear(x)\n", " out = self.dropout(out)\n", " out = self.activate(out)\n", " \n", " loss = self.criterion(out, labels)\n", "\n", " return {\"preds\": out, \"labels\": labels}, loss\n", "\n", "my_model_config = MyModelConfig(hidden_dim=100, activation=\"relu\", dropout=0.3)\n", "\n", "```\n", "\n", "Notice how we're returning a dictionary for model's predictions and labels and loss value in the `forward` method." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We next create a training configuration, which requires an optimizer config and an optional scheduler config. Here we create `optimizer_config` that configs an `SGD` optimizer, and `scheduler_config` that configs a `OneCycleLR` scheduler, and then use them in `train_config`:\n", "\n", "```python\n", "from ablator import OptimizerConfig, SchedulerConfig\n", "from ablator import TrainConfig\n", "\n", "optimizer_config = OptimizerConfig(name=\"sgd\", arguments={\"lr\": 0.1})\n", "scheduler_config = SchedulerConfig(name=\"cycle\", arguments={\"max_lr\": 0.5, \"total_steps\": 50})\n", "\n", "train_config = TrainConfig(\n", " dataset=\"test\",\n", " batch_size=128,\n", " epochs=2,\n", " optimizer_config=optimizer_config,\n", " scheduler_config=scheduler_config,\n", ")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last step is to create a `run_config` object. This object combines `train_config` and `my_model_config`, along with runtime settings like verbosity and device. However, we also need to redefine the run config class to update its `model_config` attribute from `ModelConfig` (by default) to `MyModelConfig`:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "@configclass\n", "class CustomRunConfig(RunConfig):\n", " model_config: MyModelConfig\n", "\n", "run_config = CustomRunConfig(\n", " train_config=train_config,\n", " model_config=my_model_config,\n", " verbose=\"silent\",\n", " device=\"cpu\",\n", ")\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "That's it, we have finished configuring our experiment! With this, we are half-way to launching an ablation experiment. Refer to [Prototyping models](./Prototyping-models.ipynb) and [Hyperparameter Optimization](./HPO-tutorial.ipynb) tutorials for the next steps after configuration to launch the experiment.\n", "\n", "
\n", "\n", "Note\n", "\n", "All configuration classes must inherits from `ConfigBase` and decorated with `@configclass` decorator, you can see this in `MyModelConfig` and `CustomRunConfig` classes in the example above.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ablator custom data types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An important part of Ablator's configuration system is the incorporation of custom data types, which are used to define data type for configuration attributes. The framework created three special types: **Stateless**, **Stateful**, and **Derived**. These are custom Python annotations to define configuration attributes to which the experiment state is agnostic, aka does not have any impact on the experiment state (which can be Complete, Running, Pending, Pruned, etc. Read more about experiment state from our [paper](https://iordanis.me/data/ablator.pdf)).\n", "\n", "- **Stateless** attributes can take different values between trials or experiments. For example, learning rate should be stateless, as we can train models with different learning rates. Note that if you're declaring a variable to be Stateless, it must be assigned an initial value before launching the experiment.\n", "\n", "- **Stateful** attributes, opposite to **Stateless**, must have the same value between different experiments. For example, a binary classification model should always have output size of 2. Stateful variables, defined as a primitive datatype (no annotating needed), must be assigned with values before launching the experiment.\n", "\n", "- **Derived** attributes are **Stateful** and are un-decided at the start of the experiment. Their values are determined by internal experiment processes that can depend on other experimental attributes, e.g model input size that depends on the dataset.\n", "\n", "\n", "
\n", "\n", "Note\n", "\n", "- The reason for creating these annotations is that Ablator supports stateful experiment design, so the configuration should be unambiguous at the initialization state. And the use of these annotations assures the unambiguity of the configuration.\n", "- For more information about our stateful experiment design, see our paper: [ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments](https://iordanis.me/data/ablator.pdf)\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Alternatives to constructing configuration objects" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "There are three methods to configure an experiment: named arguments, file-based, or dictionary-based. All previous code snippets are examples of the named-arguments method. Now let's look at how file based method and dictionary based method work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### File-based" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "File based configuration is a way for you to create simple configuration files. You can use `.load(path/to/yaml/file)` method to create configuration with values provided in the config file.\n", "\n", "To write these config files, simply follow `yaml` syntax. Make sure that the attributes match with those in the config classes (either default config classes from ablator, or custom ones like `MyModelConfig`). The following example shows what a config yaml file looks like. We will name it `config.yaml`:\n", "\n", "```yaml\n", "experiment_dir: \"/tmp/dir\"\n", "train_config:\n", " dataset: test\n", " batch_size: 128\n", " epochs: 2\n", " optimizer_config:\n", " name: sgd\n", " arguments:\n", " lr: 0.1\n", " scheduler_config:\n", " name: cycle\n", " arguments:\n", " max_lr: 0.5\n", " total_steps: 50\n", "model_config:\n", " inp_size: 50\n", " hidden_dim: 100\n", " activation: \"relu\"\n", " dropout: 0.15\n", "verbose: \"silent\"\n", "device: \"cpu\"\n", "```\n", "\n", "Now in your code, load these values to create the config object:\n", "\n", "```python\n", "config = CustomRunConfig.load(\"path/to/yaml/file\")\n", "```\n", "\n", "Note that since we created a custom running configuration class `CustomRunConfig` that is tied to the custom model config in the previous sections, we used `CustomRunConfig.load(\"path/to/yaml/file\")` to load configuration from file. Otherwise, if you're not creating any subclasses, simply run `RunConfig.load(\"path\")` or `ParallelConfig.load(\"path\")`." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Dictionary based\n", "\n", "Another alternative is similar to the file-based method, but it's defining configurations in a dictionary instead of a yaml file, and then the dictionary will be passed (as keyword arguments) to the running configuration at initialization" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "configuration = {\n", " \"experiment_dir\": \"/tmp/dir\",\n", " \"train_config\": {\n", " \"dataset\": \"test\",\n", " \"batch_size\": 128,\n", " \"epochs\": 2,\n", " \"optimizer_config\":{\n", " \"name\": \"sgd\",\n", " \"arguments\": {\n", " \"lr\": 0.1\n", " }\n", " },\n", " \"scheduler_config\":{\n", " \"name\": \"cycle\",\n", " \"arguments\":{\n", " \"max_lr\": 0.5,\n", " \"total_steps\": 50\n", " }\n", " }\n", " },\n", " \"model_config\": {\n", " \"inp_size\": 50,\n", " \"hidden_dim\": 100,\n", " \"activation\": \"relu\",\n", " \"dropout\": 0.15\n", " },\n", " \"verbose\": \"silent\",\n", " \"device\": \"cpu\"\n", "}\n", "\n", "config = CustomRunConfig(\n", " **configuration\n", ")\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now that you know how to configure experiments, you can start creating your own prototype. In the next chapter, we will learn how to write a prototype model, define necessary configurations, and launch the experiment. " ] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }