{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "FxRTm4zWpwoo" }, "source": [ "# Hyperparameter Optimization" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "uHitmULK0W1v" }, "source": [ "After your prototype has been verified and runs smoothly with `ProtoTrainer`, you can scale it to an ablation study, perform parallel hyperparameter optimization, and analyze the results with `Ablator`.\n", "\n", "In this chapter, we will learn how to set up and launch a parallel ablation experiment for an ablation study with Ablator.\n", "\n", "\n", "Similarly to launching a prototype experiment, here there are also 3 main steps to run an ablation experiment in ablator:\n", "\n", "- Configure the experiment.\n", "\n", "- Create model wrapper that defines boiler-plate code for training and evaluating models.\n", "\n", "- Create the trainer and launch the experiment.\n", "\n", "Recall from the [Introduction tutorial](./Configuration-Basics.ipynb), Ablator combines Optuna for hyperparameter optimization (HPO) and Ray back-end for parallelizing the trials. So, an extra step is to start a ray cluster before launching the experiment." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "iw2ie4ik0fiP" }, "source": [ "Let us first import all necessary dependencies:" ] }, { "cell_type": "markdown", "metadata": { "id": "bPChwlLe_BUq" }, "source": [ "```python \n", "from ablator import ModelConfig, OptimizerConfig, TrainConfig, RunConfig, ParallelConfig\n", "from ablator import ModelWrapper, ParallelTrainer, configclass\n", "from ablator.config.hpo import SearchSpace\n", "\n", "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "from torch.utils.data import DataLoader, Dataset\n", "import torchvision\n", "import torchvision.transforms as transforms\n", "\n", "import os\n", "import shutil\n", "from sklearn.metrics import f1_score, accuracy_score\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch the parallel experiment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure the experiment" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "2u5Ovi0-YrtJ" }, "source": [ "We will follow exactly the same steps as in the tutorial on [Prototyping models](./Prototyping-models.ipynb) to configure the experiment:\n", "\n", "Here's a summary of how we will configure it:\n", "\n", "- **Model Configuration**: defines hyperparameters for the number of filters and activation function.\n", "\n", "- **Optimizer Configuration**: adam (lr = 0.001).\n", "\n", "- **Train Configuration**: `batch_size = 32`, `epochs = 10`, random weights initialization is set as true.\n", "\n", "- **Runing Configuration**: GPU as hardware, a random seed for the experiment. We let the experiment runs HPO for `total_trials = 20` trials, allowing `concurrent_trials = 2` trials to run in parallel. We also use a HPO search space for the model and the optimizer, and use validation loss as the metric to optimize, in specific, we want to minimize this (`{\"val_loss\": \"min\"}`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Configure the model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Model configuration\n", "\n", "For the model configuration, we defines the following hyperparameters:\n", "\n", "- `num_filter1`, `num_filter2` (integer): number of filters at each convolutional layer\n", "\n", "- `activation` (string): activation function to use in layers." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "JUcqbxNe_3P-" }, "source": [ "```python\n", "@configclass\n", "class CustomModelConfig(ModelConfig):\n", " num_filter1: int\n", " num_filter2: int\n", " activation: str\n", "\n", "model_config = CustomModelConfig(\n", " num_filter1 =32,\n", " num_filter2 = 64,\n", " activation = \"relu\"\n", ")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the hyperparameters are defined using primitive data types (aka Stateful), we must provide concrete values when initializing the `model_config` object. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "PKPFcebsYvJY" }, "source": [ "##### Creating Pytorch CNN Model\n", "\n", "We define a custom CNN model `FashionCNN` with the following architecture:\n", "\n", "- The first convolutional layer: takes a single channel and applies `num_filter1` filters to it. Then, applies an activation function and a max pooling layer.\n", "\n", "- The second convolutional layer: takes `num_filter1` channels and applies `num_filter2` filters to them. It also utilizes an activation function and a pooling layer.\n", "\n", "- The third convolutional layer: This is an additional layer that applies `num_filter2` filters.\n", "\n", "- A flattening layer: converts the convolutional layers into a linear format and subsequently produces a 10-dimensional output for labeling.\n", "\n", "`FashionCNN` is then included in `MyModel` as a sub-module. `MyModel`'s forward function performs forward computation, add a loss function, and returns the predicted labels and loss during model training and evaluation. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "U3TMcBbNAFGV" }, "source": [ "```python\n", "# Define the model\n", "class FashionCNN(nn.Module):\n", " def __init__(self, config: CustomModelConfig):\n", " super(FashionCNN, self).__init__()\n", "\n", " activation_list = {\"relu\": nn.ReLU(), \"elu\": nn.ELU(), \"leakyRelu\": nn.LeakyReLU()}\n", "\n", " num_filter1 = config.num_filter1\n", " num_filter2 = config.num_filter2\n", " activation = activation_list[config.activation]\n", "\n", " self.conv1 = nn.Conv2d(1, num_filter1, kernel_size=3, stride=1, padding=1)\n", " self.act1 = activation\n", " self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)\n", "\n", " self.conv2 = nn.Conv2d(num_filter1, num_filter2, kernel_size=3, stride=1, padding=1)\n", " self.act2 = activation\n", " self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)\n", "\n", " self.conv3 = nn.Conv2d(num_filter2, num_filter2, kernel_size=3, stride=1, padding=1)\n", " self.act3 = activation\n", "\n", " self.flatten = nn.Flatten()\n", " self.fc1 = nn.Linear(num_filter2 * 7 * 7, 10)\n", " \n", "\n", " def forward(self, x):\n", " x = self.conv1(x)\n", " x = self.act1(x)\n", " x = self.maxpool1(x)\n", " x = self.conv2(x)\n", " x = self.act2(x)\n", " x = self.maxpool2(x)\n", " x = self.conv3(x)\n", " x = self.act3(x)\n", " x = self.flatten(x)\n", " x = self.fc1(x)\n", " \n", " return x\n", "\n", "class MyModel(nn.Module):\n", " def __init__(self, config: CustomModelConfig) -> None:\n", " super().__init__()\n", "\n", " self.model = FashionCNN(config)\n", " self.loss = nn.CrossEntropyLoss()\n", "\n", " def forward(self, x, labels=None):\n", " out = self.model(x)\n", " loss = None\n", "\n", " if labels is not None:\n", " loss = self.loss(out, labels)\n", " labels = labels.reshape(-1, 1)\n", "\n", " out = out.argmax(dim=-1)\n", " out = out.reshape(-1, 1)\n", "\n", " return {\"y_pred\": out, \"y_true\": labels}, loss\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Configure the training process" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "optimizer_config = OptimizerConfig(\n", " name=\"adam\",\n", " arguments={\"lr\": 0.001}\n", ")\n", "\n", "train_config = TrainConfig(\n", " dataset=\"Fashion-mnist\",\n", " batch_size=32,\n", " epochs=10,\n", " optimizer_config=optimizer_config,\n", " scheduler_config=None,\n", " rand_weights_init = True\n", ")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Configure the running configuration\n", "\n", "To run an ablation study, we need to specify a search space for the hyperparameters of interest. This search space will then be used to configure the running configuration." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "r847W0MDY4Ai" }, "source": [ "##### Search Space\n", "\n", "For this tutorial, we have defined `search_space` object for four different hyperparameters:\n", "\n", "- Number of filters in the first and second convolutional layers: range between 32 and 64, and 64 and 128, respectively.\n", "\n", "- The activation function to use: any of `relu`, `elu`, and `leakyRelu`.\n", "\n", "- Learning rate value: ranges between 1e-3 and 1e-2." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "AJfHDNVUBNBB" }, "source": [ "```python\n", "search_space = {\n", " \"model_config.num_filter1\": SearchSpace(value_range = [32, 64], value_type = 'int'),\n", " \"model_config.num_filter2\": SearchSpace(value_range = [64, 128], value_type = 'int'),\n", " \"train_config.optimizer_config.arguments.lr\": SearchSpace(\n", " value_range = [0.001, 0.01],\n", " value_type = 'float'\n", " ),\n", " \"model_config.activation\": SearchSpace(categorical_values = [\"relu\", \"elu\", \"leakyRelu\"]),\n", "}\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "LWBSub7uZM6W" }, "source": [ "#### Parallel Configuration\n", "\n", "As the last step to configure the experiment, we pass `search_space`, `train_config`, and `model_config` to the `ParallelConfig`. Other parameters are also set:" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "aAxNLP2QZNyI" }, "source": [ "```python\n", "@configclass\n", "class CustomParallelConfig(ParallelConfig):\n", " model_config: CustomModelConfig\n", "\n", "parallel_config = CustomParallelConfig(\n", " train_config=train_config,\n", " model_config=model_config,\n", " metrics_n_batches = 800,\n", " experiment_dir = \"/tmp/experiments/\",\n", " device=\"cuda\",\n", " amp=True,\n", " random_seed = 42,\n", " total_trials = 20,\n", " concurrent_trials = 5,\n", " search_space = search_space,\n", " optim_metrics = {\"val_loss\": \"min\"},\n", " gpu_mb_per_experiment = 1024,\n", ")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "