Prototyping Models

The purpose of prototyping a model is to quickly build and test it, enabling later parallel scaling with minimal code change for hyperparameter optimization.
This chapter covers about training a model using Ablator with a popular Fashion-mnist dataset.

Running Experiments using Ablator

Running an experiment involves loading configurations, training the model, and producing metrics. Ablator utilizes configurations, a model wrapper, and a trainer class to run an experiment for the given prototype.

Setting up Ablator

Install ablator using the command: pip install ablator

Import the Configs, ModelWrapper and ProtoTrainer from ablator.

[1]:

from ablator import ModelConfig, OptimizerConfig, TrainConfig, RunConfig
from ablator import ModelWrapper, ProtoTrainer, Stateless, Derived, configclass

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms

from sklearn.metrics import f1_score, accuracy_score

import os
import shutil

Configurations

Each config class has its own arguments and serves a specific purpose in defining the configuration for the experiment.

Defining Configs:

Optimizer Config: adam (lr = 0.001).
Train Config: batch_size = 32, epochs = 20, random weights initialization is set as true.
Run Config: device details, experiment directory and a random seed for experiment.
Model Config: dimensions for the layers of the model.

[2]:

@configclass
class CustomModelConfig(ModelConfig):
    input_size :int
    hidden_size :int
    num_classes :int

model_config = CustomModelConfig(
    input_size = 28*28,
    hidden_size = 256,
    num_classes = 10
    )

optimizer_config = OptimizerConfig(
    name="adam",
    arguments={"lr": 0.001}
)

train_config = TrainConfig(
    dataset="Fashion-mnist",
    batch_size=32,
    epochs=20,
    optimizer_config=optimizer_config,
    scheduler_config=None,
    rand_weights_init = True
)

@configclass
class CustomRunConfig(RunConfig):
    model_config: CustomModelConfig

run_config = CustomRunConfig(
    train_config=train_config,
    model_config=model_config,
    metrics_n_batches = 800,
    experiment_dir = "/tmp/experiments",
    device="cpu",
    amp=False,
    random_seed = 42
)

Importing the dataset

Fashion MNIST is a dataset consisting of 60,000 grayscale images of fashion items. The images are categorized into ten classes, that includes clothing items.

Image dimensions: 28 pixels x 28 pixels (grayscale) Shape of the training data tensor: [60000, 1, 28, 28]

[3]:

transform = transforms.ToTensor()

train_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=True,
    download=True,
    transform=transform
)

test_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=False,
    download=True,
    transform=transform
)

Creating Pytorch Model

Model Architecture (Simple Neural Network with Linear Layers):

Linear_1_(28*28, 256) -> ReLU -> Linear_2_(256, 256) -> ReLU -> Linear_3_(256, 10). (where, ReLU is an Activation function)

MyModel defines a model class that extends an existing model, FashionMNISTModel. It adds a loss function, performs forward computation, and returns the predicted labels and loss during model training and evaluation.

It is required to return the outputs and loss in the forward method of MyModel. The outputs must be in a dictionary format. Example: {"y_pred": out, "y_true": labels}.

[4]:

class FashionMNISTModel(nn.Module):
    def __init__(self, config: CustomModelConfig):
        super(FashionMNISTModel, self).__init__()

        input_size = config.input_size
        hidden_size = config.hidden_size
        num_classes = config.num_classes

        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return x

class MyModel(nn.Module):
    def __init__(self, config: CustomModelConfig) -> None:
        super().__init__()

        self.model = FashionMNISTModel(config)
        self.loss = nn.CrossEntropyLoss()

    def forward(self, x, labels=None):
        out = self.model(x)
        loss = None

        if labels is not None:
            loss = self.loss(out, labels)

        out = out.argmax(dim=-1)

        return {"y_pred": out, "y_true": labels}, loss

Defining Custom Evaluation Metrics

Defining evaluation functions for classification problems. Using average as “weighted” for multiclass evaluation.

[5]:

def my_accuracy(y_true, y_pred):
    return accuracy_score(y_true.flatten(), y_pred.flatten())

def my_f1_score(y_true, y_pred):
    return f1_score(y_true.flatten(), y_pred.flatten(), average='weighted')

Model Wrapper

This class serves as a comprehensive wrapper for PyTorch models, providing a high-level interface for handling various tasks involved in model training.
It takes care of importing parameters from configuration files into the model, setting up optimizers and schedulers and checkpoints, logging metrics, handling interruptions, creating and utilizing data loaders, evaluating model and much more.
By encapsulating these functionalities, it significantly reduces development efforts, minimizes the need for writing complex code, ultimately improving efficiency and productivity.

In PyTorch, a DataLoader is a utility class that provides an iterable over a dataset. It is commonly used for handling data loading and batching in machine learning and deep learning tasks.

The dataloaders and evaluation functions are passed to the ModelWrapper

[6]:

class MyModelWrapper(ModelWrapper):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def make_dataloader_train(self, run_config: CustomRunConfig):
        return torch.utils.data.DataLoader(
            train_dataset,
            batch_size=32,
            shuffle=True
        )

    def make_dataloader_val(self, run_config: CustomRunConfig):
        return torch.utils.data.DataLoader(
            test_dataset,
            batch_size=32,
            shuffle=False
        )

    def evaluation_functions(self):
        return {
            "accuracy": my_accuracy,
            "f1": my_f1_score
        }

ProtoTrainer

This class is responsible to start training the model in the ModelWrapper, preparing resources for model to avoid stalling during training or conficts between other trainers.
Provides logging and syncing facilities to the provided directory or external remote servers like google cloud etc. It also does evaluation and syncing metrics to the directories.
Therefore, to achieve this, it requires ModelWrapper and run_config as inputs.

First, we wrap the model (MyModel) in a ModelWrapper (MyModelWrapper). Then, we create an instance of Prototrainer, passing the run_config and wrapper as arguments, and then call the launch() method to start the experiment. The launch() method returns an object of Class TrainMetrics. It is used for calculates metrics for custom evaluation functions.

[7]:

if not os.path.exists(run_config.experiment_dir):
    shutil.os.mkdir(run_config.experiment_dir)

shutil.rmtree(run_config.experiment_dir)

wrapper = MyModelWrapper(
    model_class=MyModel,
)

ablator = ProtoTrainer(
    wrapper=wrapper,
    run_config=run_config,
)
metrics = ablator.launch()

Results

The TrainMetrics object stores and manages predictions and calculates metrics using evaluation functions. We can access all the metrics from the TrainMetrics object using its to_dict() method.

A more detailed exploration of interpreting results will be undertaken in a later chapter.

[8]:

metrics_dict = metrics.to_dict()
max_key_length = max(len(str(k)) for k in metrics_dict.keys())

for k, v in metrics_dict.items():
    print(f"{k:{max_key_length}} : {v}")

train_loss        : 2.3299765326192716
val_loss          : 7.457184716395309
train_accuracy    : 0.849365234375
train_f1          : 0.8493114627262905
val_accuracy      : 0.816905
val_f1            : 0.816258432667835
best_iteration    : 35625
best_loss         : 7.65961674023207
current_epoch     : 20
current_iteration : 37500
epochs            : 20
learning_rate     : 0.001
total_steps       : 37500

Conclusion

Thus, we have successfully built and tested a prototype model using the ablator. In the later chapters, we will explore deeper into hyperparameter optimization with more complex models.

Additional Info

Why training with ProtoTrainer?

It provides a robust way to handle errors during training.
Ideal for prototyping experiments in a local environment.
Easily adaptable for hyperparameter optimization with larger configurations and horizontal scaling.
Quick transition to ParallelConfig and ParallelTrainer for parallel execution of trials using Ray.

How to visualize metrics

We can also visualize metrics on TensorBoard with respect to every epoch.
Just install tensorboard. Load using %load_ext tensorboard if using notebook.
Run the command %tensorboard --logdir /tmp/experiments/[experiment_dir_name]/dashboard/tensorboard --port [port]