Prototyping Models#

Let’s say you have a novel idea for a model architecture and you want to run ablation study on it with ablator. Ablator simplifies the process of prototyping your model, allowing you to swiftly construct and evaluate your innovative concept. Once a prototype runs smoothly, you can switch to parallel ablation study, which trains and runs HPO of different trials, with minimal code change for hyperparameter optimization.

This chapter covers prototyping a model using Ablator, training the model on the popular Fashion-mnist dataset.

There are 3 main steps to run a prototype experiment in ablator:

Configure the prototype experiment.
Create model wrapper that defines boiler-plate code for training and evaluating models.
Create the trainer and launch the experiment.

Let us first import all necessary dependencies:

from ablator import ModelConfig, OptimizerConfig, TrainConfig, RunConfig
from ablator import ModelWrapper, ProtoTrainer, configclass

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms

from sklearn.metrics import f1_score, accuracy_score

import shutil

Launch the prototype experiment#

Configure the experiment#

We will follow exactly the same steps as in the previous tutorial on Configuration Basics to configure the experiment:

Here’s a summary of how we will configure it:

Model Configuration: dimensions for the layers of the model.
Optimizer Configuration: adam (lr = 0.001).
Train Configuration: batch_size = 32, epochs = 20, random weights initialization is set as true.
Running Configuration: CPU as hardware and a random seed for the experiment.

Configure the model#

Model configuration#

For the model configuration, we defines hyperparameters input_size, hidden_size, and num_classes as integer config attributes.

@configclass
class CustomModelConfig(ModelConfig):
    input_size :int
    hidden_size :int
    num_classes :int

model_config = CustomModelConfig(
    input_size = 28*28,
    hidden_size = 256,
    num_classes = 10
    )

Since the hyperparameters are defined using primitive data type integer (aka Stateful), we must provide concrete values when initializing the model_config object.

Creating Pytorch Model#

Model Architecture (Simple Neural Network with Linear Layers):

Linear_1_(28*28, 256) -> ReLU -> Linear_2_(256, 256) -> ReLU -> Linear_3_(256, 10). (where; ReLU is an Activation function)

Note that here we depart from the Configuration Basics tutorial, we construct our model as a 2-level module:

FashionMNISTModel defines the model architecture (your novel idea), this is where we use the model config attributes to construct the model.
MyModel includes the main model architecture as a sub-module, adds a loss function, performs forward computation, and returns the predicted labels and loss during model training and evaluation.

class FashionMNISTModel(nn.Module):
    def __init__(self, config: CustomModelConfig):
        super(FashionMNISTModel, self).__init__()

        input_size = config.input_size
        hidden_size = config.hidden_size
        num_classes = config.num_classes

        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return x

class MyModel(nn.Module):
    def __init__(self, config: CustomModelConfig) -> None:
        super().__init__()

        self.model = FashionMNISTModel(config)
        self.loss = nn.CrossEntropyLoss()

    def forward(self, x, labels=None):
        out = self.model(x)
        loss = None

        if labels is not None:
            loss = self.loss(out, labels)
            labels = labels.reshape(-1, 1)

        out = out.argmax(dim=-1)
        out = out.reshape(-1, 1)

        return {"y_pred": out, "y_true": labels}, loss

Configure the training process#

optimizer_config = OptimizerConfig(
    name="adam",
    arguments={"lr": 0.001}
)

train_config = TrainConfig(
    dataset="Fashion-mnist",
    batch_size=32,
    epochs=20,
    optimizer_config=optimizer_config,
    scheduler_config=None,
    rand_weights_init = True
)

Configure the running configuration#

@configclass
class CustomRunConfig(RunConfig):
    model_config: CustomModelConfig

run_config = CustomRunConfig(
    train_config=train_config,
    model_config=model_config,
    metrics_n_batches = 800,
    experiment_dir = "/tmp/experiments",
    device="cuda",
    amp=False,
    random_seed = 42
)

Note

We recommend that the experiment directory RunConfig.experiment_dir should be an empty directory.
Make sure to redefine the running configuration class to update its model_config attribute from ModelConfig (by default) to CustomModelConfig before creating the config object.

Create the model wrapper#

The model wrapper class ModelWrapper serves as a comprehensive wrapper for PyTorch models, providing a high-level interface for handling various tasks involved in model training. It defines boiler-plate code for training and evaluating models, which significantly reduces development efforts and minimizes the need for writing complex code, ultimately improving efficiency and productivity:

It takes care of creating and utilizing data loaders, evaluating models, importing parameters from configuration files into the model, setting up optimizers and schedulers, and checkpoints, logging metrics, handling interruptions, and much more.
Its functions are over-writable to support for custom use-cases (read more about these functions in this documentation of Model Wrapper).

An important function of the ModelWrapper is make_dataloader_train, which is used to create a data loader for training the model. In fact, you must provide a train dataloader to make_dataloader_train before launching the experiment.

Therefore, we will start prepare the datasets first. Then, we write some eluation functions to be used to evaluate our model. Finally, we will create the model wrapper and train the model.

Prepare the dataset#

Fashion MNIST is a dataset consisting of 60,000 grayscale images of fashion items. The images are categorized into ten classes, which include clothing items.

Image dimensions: 28 pixels x 28 pixels (grayscale)
Shape of the training data tensor: [60000, 1, 28, 28]

Here we will create two datasets: one for training and one for validation.

transform = transforms.ToTensor()

train_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=True,
    download=True,
    transform=transform
)

test_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=False,
    download=True,
    transform=transform
)

Defining Custom Evaluation Metrics#

Defining evaluation functions for classification problems. Using average as “weighted” for multiclass evaluation.

def my_accuracy(y_true, y_pred):
    return accuracy_score(y_true.flatten(), y_pred.flatten())

def my_f1_score(y_true, y_pred):
    return f1_score(y_true.flatten(), y_pred.flatten(), average='weighted')

Note

Make sure that parameters to the evaluation function match the model’s forward dictionary output. Since MyModel’s returned dictionary has keys "y_true" and "y_pred", the evaluation function must have parameters "y_true" and "y_pred".

Create the Model Wrapper#

We will now create a model wrapper class and overwrite the following functions:

make_dataloader_train and make_dataloader_val: to provide the training dataset and validation dataset as dataloaders (In PyTorch, a DataLoader is a utility class that provides an iterable over a dataset. It is commonly used for handling data loading and batching in machine learning and deep learning tasks).
evaluation_functions: to provide the evaluation functions that will evaluate the model on the datasets. In this function, you must return a dictionary of callables, where the keys are the names of the evaluation metrics and the values are the functions that compute the metrics.

class MyModelWrapper(ModelWrapper):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def make_dataloader_train(self, run_config: CustomRunConfig):
        return torch.utils.data.DataLoader(
            train_dataset,
            batch_size=32,
            shuffle=True
        )

    def make_dataloader_val(self, run_config: CustomRunConfig):
        return torch.utils.data.DataLoader(
            test_dataset,
            batch_size=32,
            shuffle=False
        )

    def evaluation_functions(self):
        return {
            "accuracy": my_accuracy,
            "f1": my_f1_score
        }

Now create the model wrapper object, passing the model class as its argument:

wrapper = MyModelWrapper(
    model_class=MyModel,
)

Create the trainer and launch the experiment#

For a prototype experiment, we will use the prototype trainer ProtoTrainer to launch the experiment.

Initialize the trainer, providing it with the model wrapper and the running configuration. After that, calling the launch() method will start the training process.

ablator = ProtoTrainer(
    wrapper=wrapper,
    run_config=run_config,
)
metrics = ablator.launch()

Experiment results#

The ProtoTrainer.launch() method returns a dictionary which stores metrics of the experiment

A more detailed exploration of interpreting results will be undertaken in a later chapter.

max_key_length = max(len(str(k)) for k in metrics.keys())

for k, v in metrics.items():
    print(f"{k:{max_key_length}} : {v}")

val_loss          : 0.5586626408626636
val_accuracy      : 0.8687149999999999
val_f1            : 0.8684085851245271
train_loss        : 0.2816645764191945
train_accuracy    : 0.8915705128205127
train_f1          : 0.891141942313593
best_iteration    : 3750
best_loss         : 0.4098668480262208
current_epoch     : 20
current_iteration : 37500
epochs            : 20
learning_rate     : 0.001
total_steps       : 37500

How to visualize metrics

ablator automatically records metrics so that you can visualize them in TensorBoard and observe how they change every epoch:

Just install tensorboard, import it, and load using %load_ext tensorboard if using a notebook.
Run the command %tensorboard --logdir <experiment_dir>/dashboard/tensorboard --port [port], where <experiment_dir> is the experiment directory that we passed to the parallel config (run_config.experiment_dir = "/tmp/experiments/")

Conclusion#

That’s it! We have successfully built and tested a prototype model using ablator. In the later chapters, we will learn how to scale a prototype to a cluster of parallel processes to explore hyperparameter optimization with more complex models.

Why train with ProtoTrainer?

It provides a robust way to handle errors during training.
Ideal for prototyping experiments in a local environment.
Easily adaptable for hyperparameter optimization with larger configurations and horizontal scaling.
Quick transition to ParallelConfig and ParallelTrainer for parallel execution of trials using Ray.