{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "FxRTm4zWpwoo" }, "source": [ "# Hyperparameter Optimization" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "uHitmULK0W1v" }, "source": [ "* In this chapter, we are going to explore how to perform hyperparameter optimization on a CNN model using Ablator.\n", "\n", "Why do HPO with Ablator?\n", "\n", "* Ablator combines the Ray back-end with Optuna for hyperparameter optimization (HPO), eliminating the need for boilerplate code in fault-tolerant strategies, training, and result analysis." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "iw2ie4ik0fiP" }, "source": [ "#### Importing libraries\n", "- Import the **Configs**, **ModelWrapper**, and **ParallelTrainer** from ablator.\n", "- Import **SearchSpace** from ablator.main.configs." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "bPChwlLe_BUq" }, "source": [ "```python \n", "from ablator import ModelConfig, OptimizerConfig, TrainConfig, RunConfig, ParallelConfig\n", "from ablator import ModelWrapper, ParallelTrainer, configclass\n", "from ablator.main.configs import SearchSpace\n", "\n", "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "from torch.utils.data import DataLoader, Dataset\n", "import torchvision\n", "import torchvision.transforms as transforms\n", "\n", "import os\n", "import shutil\n", "from sklearn.metrics import f1_score, accuracy_score\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "2u5Ovi0-YrtJ" }, "source": [ "#### Configurations\n", "\n", "Defining Configs:\n", "\n", "- **Optimizer Config**: adam (lr = 0.001).\n", "- **Train Config**: batch_size = 32, epochs = 10, random weights initialization is set as true.\n", "- **Model Config**: The ````CustomModelConfig```` defines two parameters for the number of filters and an activation function." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "JUcqbxNe_3P-" }, "source": [ "```python\n", "@configclass\n", "class CustomModelConfig(ModelConfig):\n", " num_filter1: int\n", " num_filter2: int\n", " activation: str\n", "\n", "\n", "model_config = CustomModelConfig(num_filter1 =32, num_filter2 = 64, activation = \"relu\")\n", "\n", "optimizer_config = OptimizerConfig(\n", " name=\"adam\",\n", " arguments={\"lr\": 0.001}\n", ")\n", "\n", "train_config = TrainConfig(\n", " dataset=\"Fashion-mnist\",\n", " batch_size=32,\n", " epochs=10,\n", " optimizer_config=optimizer_config,\n", " scheduler_config=None,\n", " rand_weights_init = True\n", ")\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "PKPFcebsYvJY" }, "source": [ "#### Defining a CNN Model\n", "\n", "This is a custom CNN model with the following architecture:\n", "\n", "* The first convolutional layer: It takes a single channel and applies ````num_filters1```` filters to it. Then, it applies an activation function and a max pooling layer.\n", "* The second convolutional layer: It takes num_filters1 channels and applies ````num_filters2```` filters to them. It also utilizes an activation function and a pooling layer.\n", "* The third convolutional layer: This is an additional layer that applies ````num_filters2```` filters.\n", "* A flattening layer: It converts the convolutional layers into a linear format and subsequently produces a 10-dimensional output for labeling.\n", "\n", "Furthermore, the class MyModel extends the PyTorch model to incorporate the ````CrossEntropyLoss```` as well." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "U3TMcBbNAFGV" }, "source": [ "```python\n", "# Define the model\n", "class FashionCNN(nn.Module):\n", " def __init__(self, config: CustomModelConfig):\n", " super(FashionCNN, self).__init__()\n", "\n", " activation_list = {\"relu\": nn.ReLU(), \"elu\": nn.ELU(), \"leakyRelu\": nn.LeakyReLU()}\n", "\n", " num_filter1 = config.num_filter1\n", " num_filter2 = config.num_filter2\n", " activation = activation_list[config.activation]\n", "\n", " self.conv1 = nn.Conv2d(1, num_filter1, kernel_size=3, stride=1, padding=1)\n", " self.act1 = activation\n", " self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)\n", "\n", " self.conv2 = nn.Conv2d(num_filter1, num_filter2, kernel_size=3, stride=1, padding=1)\n", " self.act2 = activation\n", " self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)\n", "\n", " self.conv3 = nn.Conv2d(num_filter2, num_filter2, kernel_size=3, stride=1, padding=1)\n", " self.act3 = activation\n", "\n", " self.flatten = nn.Flatten()\n", " self.fc1 = nn.Linear(num_filter2 * 7 * 7, 10)\n", " \n", "\n", " def forward(self, x):\n", " x = self.conv1(x)\n", " x = self.act1(x)\n", " x = self.maxpool1(x)\n", " x = self.conv2(x)\n", " x = self.act2(x)\n", " x = self.maxpool2(x)\n", " x = self.conv3(x)\n", " x = self.act3(x)\n", " x = self.flatten(x)\n", " x = self.fc1(x)\n", " \n", " return x\n", "\n", "class MyModel(nn.Module):\n", " def __init__(self, config: CustomModelConfig) -> None:\n", " super().__init__()\n", "\n", " self.model = FashionCNN(config)\n", " self.loss = nn.CrossEntropyLoss()\n", "\n", " def forward(self, x, labels=None):\n", " out = self.model(x)\n", " loss = None\n", "\n", " if labels is not None:\n", " loss = self.loss(out, labels)\n", "\n", " out = out.argmax(dim=-1)\n", "\n", " return {\"y_pred\": out, \"y_true\": labels}, loss\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "r847W0MDY4Ai" }, "source": [ "#### Search Space\n", "\n", "For this tutorial, we have defined ````search_space```` object for four different hyperparameters.\n", "\n", "This includes:\n", "\n", "* For the number of filters in the first conv. layer.\n", "* Same for the second conv. layer.\n", "* learning rate.\n", "* activation function." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "AJfHDNVUBNBB" }, "source": [ "```python\n", "search_space = {\n", " \"model_config.num_filter1\": SearchSpace(value_range = [32, 64], value_type = 'int'),\n", " \"model_config.num_filter2\": SearchSpace(value_range = [64, 128], value_type = 'int'),\n", " \"train_config.optimizer_config.arguments.lr\": SearchSpace(\n", " value_range = [0.001, 0.01],\n", " value_type = 'float'\n", " ),\n", " \"model_config.activation\": SearchSpace(categorical_values = [\"relu\", \"elu\", \"leakyRelu\"]),\n", "}\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "LWBSub7uZM6W" }, "source": [ "#### Parallel Configuration\n", "\n", "We pass a ````search_space```` to the Parallel Config for the hyperparameters we need to explore. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "aAxNLP2QZNyI" }, "source": [ "```python\n", "@configclass\n", "class CustomParallelConfig(ParallelConfig):\n", " model_config: CustomModelConfig\n", "\n", "parallel_config = CustomParallelConfig(\n", " train_config=train_config,\n", " model_config=model_config,\n", " metrics_n_batches = 800,\n", " experiment_dir = \"/tmp/experiments/\",\n", " device=\"cuda\",\n", " amp=True,\n", " random_seed = 42,\n", " total_trials = 20,\n", " concurrent_trials = 20,\n", " search_space = search_space,\n", " optim_metrics = {\"val_loss\": \"min\"},\n", " gpu_mb_per_experiment = 1024,\n", " cpus_per_experiment = 1,\n", ")\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "DtrEqbVGZCZX" }, "source": [ "#### Importing the dataset\n", "\n", "**Fashion MNIST**\n", "\n", "Image dimensions: 28 pixels x 28 pixels (grayscale)\n", "Shape of the training data tensor: [60000, 1, 28, 28]\n" ] }, { "cell_type": "markdown", "metadata": { "id": "qciJdRxT4v1M" }, "source": [ "```python\n", "transform = transforms.ToTensor()\n", "\n", "train_dataset = torchvision.datasets.FashionMNIST(\n", " root='./data',\n", " train=True,\n", " download=True,\n", " transform=transform\n", ")\n", "\n", "test_dataset = torchvision.datasets.FashionMNIST(\n", " root='./data',\n", " train=False,\n", " download=True,\n", " transform=transform\n", ")\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "bIQiuoj2Y6VJ" }, "source": [ "The ````ModelWrapper```` will be the same as discussed in the Prototyping models." ] }, { "cell_type": "markdown", "metadata": { "id": "mE38Vg5HAKPK" }, "source": [ "```python\n", "class MyModelWrapper(ModelWrapper):\n", " def __init__(self, *args, **kwargs):\n", " super().__init__(*args, **kwargs)\n", "\n", " def make_dataloader_train(self, run_config: CustomParallelConfig):\n", " return torch.utils.data.DataLoader(\n", " train_dataset,\n", " batch_size=32,\n", " shuffle=True\n", " )\n", "\n", " def make_dataloader_val(self, run_config: CustomParallelConfig):\n", " return torch.utils.data.DataLoader(\n", " test_dataset,\n", " batch_size=32,\n", " shuffle=False\n", " )\n", "\n", " def evaluation_functions(self):\n", " return {\n", " \"accuracy\": lambda y_true, y_pred: accuracy_score(y_true.flatten(), y_pred.flatten()),\n", " }\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "ab0w-fMQZJJM" }, "source": [ "#### Creating Ray Cluster\n", "\n", "Ablator utilizes Ray for achieving parallel processing of different trials.\n", "\n", "* To initiate the Ray cluster, run the command ````ray start --head```` in a terminal. This will start the Ray head node on your local machine.\n", "\n", "* To utilize Ray for parallelization, it is necessary to connect to the Ray cluster. The Ray cluster comprises multiple Ray worker nodes capable of executing tasks in parallel.\n", "\n", "* To connect to an existing Ray cluster, use the command ````ray.init(address=\"auto\")````." ] }, { "cell_type": "markdown", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 95 }, "id": "-0F_O9-UTAFG", "outputId": "230105c8-e3d9-4ddc-c251-811d5035f839" }, "source": [ "```python\n", "import ray\n", "ray.init(address = \"auto\")\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "aHywLWRfZb5e" }, "source": [ "#### ParallelTrainer.\n", "\n", " It extends the ProtoTrainer class. The parallelTrainer executes multiple trials in parallel. It initializes Optuna trials, which are responsible for tuning the hyperparameters. Each trial is run on a separate worker node within the Ray cluster.\n", "\n", "This class manages the following tasks:\n", "\n", "* Preparing a Ray cluster for running Optuna trials to tune hyperparameters.\n", "* Initializing Optuna trials and adding them to the Optuna storage.\n", "* Syncing artifacts (experiment trials and database files) to remote sites, such as Google Cloud Storage.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "ZE_wIo66AMdn" }, "source": [ "```python\n", "if not os.path.exists(parallel_config.experiment_dir):\n", " shutil.os.mkdir(parallel_config.experiment_dir)\n", "\n", "shutil.rmtree(parallel_config.experiment_dir)\n", "\n", "wrapper = MyModelWrapper(\n", " model_class=MyModel,\n", ")\n", "\n", "ablator = ParallelTrainer(\n", " wrapper=wrapper,\n", " run_config=parallel_config,\n", ")\n", "ablator.launch(working_directory = os.getcwd(), ray_head_address=\"auto\")\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We can provide ````resume = True```` to the ````launch()```` method to resume training the model from existing checkpoints and existing experiment state. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "GPnolrLKZjS2" }, "source": [ "Shutting down the ray cluster using ````ray.shutdown()```` after use." ] }, { "cell_type": "markdown", "metadata": { "id": "pFfsvWyeAsrK" }, "source": [ "```python\n", "ray.shutdown()\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "tRK0EQBuZ9vB" }, "source": [ "#### Visualizing results w.r.t experiments\n", "\n", "Since the experiment stores TensorBoard events files for each trial, we can perform a short visualization with TensorBoard. More detailed analysis will be explored in the later tutorials.\n", "\n", "Install ````tensorboard```` and load using ````%load_ext tensorboard```` if using a notebook.\n", "\n", "* Run the command ````%tensorboard --logdir /tmp/experiments/[experiment_dir_name] --port [port]````" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "%load_ext tensorboard\n", "%tensorboard --logdir /tmp/experiments/experiment_5ade_3be2 --port 6008\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "![TensorBoard-Output](./Images/tensorboard-output.jpg)\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "mU9xvrfgZ1FJ" }, "source": [ "#### Conclusion\n", "\n", "Finally, after completing all the trials, the metrics obtained from each trial will be stored in the \"experiment_dir\". This directory will contain subdirectories representing each trial, as well as SQLite databases for Optuna and the experiment's state.\n", "\n", "Each trial will have the following components: best_checkpoints, checkpoints, results, training log, configurations, and metadata.\n", "\n", "In the later tutorial, we will learn how to analyze the results from the trained trials." ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 0 }