Prototyping Models
The purpose of prototyping a model is to quickly build and test it, enabling later parallel scaling with minimal code change for hyperparameter optimization.
This chapter covers about training a model using Ablator with a popular Fashion-mnist dataset.
Running Experiments using Ablator
Running an experiment involves loading configurations, training the model, and producing metrics. Ablator utilizes configurations, a model wrapper, and a trainer class to run an experiment for the given prototype.
Setting up Ablator
Install ablator using the command: pip install ablator
Import the Configs, ModelWrapper and ProtoTrainer from ablator.
[1]:
from ablator import ModelConfig, OptimizerConfig, TrainConfig, RunConfig
from ablator import ModelWrapper, ProtoTrainer, Stateless, Derived, configclass
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms
from sklearn.metrics import f1_score, accuracy_score
import os
import shutil
Configurations
Each config class has its own arguments and serves a specific purpose in defining the configuration for the experiment.
Defining Configs:
Optimizer Config: adam (lr = 0.001).
Train Config: batch_size = 32, epochs = 20, random weights initialization is set as true.
Run Config: device details, experiment directory and a random seed for experiment.
Model Config: dimensions for the layers of the model.
[2]:
@configclass
class CustomModelConfig(ModelConfig):
input_size :int
hidden_size :int
num_classes :int
model_config = CustomModelConfig(
input_size = 28*28,
hidden_size = 256,
num_classes = 10
)
optimizer_config = OptimizerConfig(
name="adam",
arguments={"lr": 0.001}
)
train_config = TrainConfig(
dataset="Fashion-mnist",
batch_size=32,
epochs=20,
optimizer_config=optimizer_config,
scheduler_config=None,
rand_weights_init = True
)
@configclass
class CustomRunConfig(RunConfig):
model_config: CustomModelConfig
run_config = CustomRunConfig(
train_config=train_config,
model_config=model_config,
metrics_n_batches = 800,
experiment_dir = "/tmp/experiments",
device="cpu",
amp=False,
random_seed = 42
)
Importing the dataset
Fashion MNIST is a dataset consisting of 60,000 grayscale images of fashion items. The images are categorized into ten classes, that includes clothing items.
Image dimensions: 28 pixels x 28 pixels (grayscale) Shape of the training data tensor: [60000, 1, 28, 28]
[3]:
transform = transforms.ToTensor()
train_dataset = torchvision.datasets.FashionMNIST(
root='./data',
train=True,
download=True,
transform=transform
)
test_dataset = torchvision.datasets.FashionMNIST(
root='./data',
train=False,
download=True,
transform=transform
)
Creating Pytorch Model
Model Architecture (Simple Neural Network with Linear Layers):
Linear_1_(28*28, 256) -> ReLU -> Linear_2_(256, 256) -> ReLU -> Linear_3_(256, 10). (where, ReLU is an Activation function)
MyModel defines a model class that extends an existing model, FashionMNISTModel. It adds a loss function, performs forward computation, and returns the predicted labels and loss during model training and evaluation.
It is required to return the outputs and loss in the forward method of MyModel. The outputs must be in a dictionary format. Example: {"y_pred": out, "y_true": labels}.
[4]:
class FashionMNISTModel(nn.Module):
def __init__(self, config: CustomModelConfig):
super(FashionMNISTModel, self).__init__()
input_size = config.input_size
hidden_size = config.hidden_size
num_classes = config.num_classes
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu1 = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.relu2 = nn.ReLU()
self.fc3 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
x = x.view(x.size(0), -1)
x = self.fc1(x)
x = self.relu1(x)
x = self.fc2(x)
x = self.relu2(x)
x = self.fc3(x)
return x
class MyModel(nn.Module):
def __init__(self, config: CustomModelConfig) -> None:
super().__init__()
self.model = FashionMNISTModel(config)
self.loss = nn.CrossEntropyLoss()
def forward(self, x, labels=None):
out = self.model(x)
loss = None
if labels is not None:
loss = self.loss(out, labels)
out = out.argmax(dim=-1)
return {"y_pred": out, "y_true": labels}, loss
Defining Custom Evaluation Metrics
Defining evaluation functions for classification problems. Using average as “weighted” for multiclass evaluation.
[5]:
def my_accuracy(y_true, y_pred):
return accuracy_score(y_true.flatten(), y_pred.flatten())
def my_f1_score(y_true, y_pred):
return f1_score(y_true.flatten(), y_pred.flatten(), average='weighted')
Model Wrapper
This class serves as a comprehensive wrapper for PyTorch models, providing a high-level interface for handling various tasks involved in model training.
It takes care of importing parameters from configuration files into the model, setting up optimizers and schedulers and checkpoints, logging metrics, handling interruptions, creating and utilizing data loaders, evaluating model and much more.
By encapsulating these functionalities, it significantly reduces development efforts, minimizes the need for writing complex code, ultimately improving efficiency and productivity.
In PyTorch, a DataLoader is a utility class that provides an iterable over a dataset. It is commonly used for handling data loading and batching in machine learning and deep learning tasks.
The dataloaders and evaluation functions are passed to the ModelWrapper
[6]:
class MyModelWrapper(ModelWrapper):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def make_dataloader_train(self, run_config: CustomRunConfig):
return torch.utils.data.DataLoader(
train_dataset,
batch_size=32,
shuffle=True
)
def make_dataloader_val(self, run_config: CustomRunConfig):
return torch.utils.data.DataLoader(
test_dataset,
batch_size=32,
shuffle=False
)
def evaluation_functions(self):
return {
"accuracy": my_accuracy,
"f1": my_f1_score
}
ProtoTrainer
This class is responsible to start training the model in the
ModelWrapper, preparing resources for model to avoid stalling during training or conficts between other trainers.Provides logging and syncing facilities to the provided directory or external remote servers like google cloud etc. It also does evaluation and syncing metrics to the directories.
Therefore, to achieve this, it requires
ModelWrapperandrun_configas inputs.
First, we wrap the model (MyModel) in a ModelWrapper (MyModelWrapper). Then, we create an instance of Prototrainer, passing the run_config and wrapper as arguments, and then call the launch() method to start the experiment. The launch() method returns an object of Class TrainMetrics. It is used for calculates metrics for custom evaluation functions.
[7]:
if not os.path.exists(run_config.experiment_dir):
shutil.os.mkdir(run_config.experiment_dir)
shutil.rmtree(run_config.experiment_dir)
wrapper = MyModelWrapper(
model_class=MyModel,
)
ablator = ProtoTrainer(
wrapper=wrapper,
run_config=run_config,
)
metrics = ablator.launch()
Results
The TrainMetrics object stores and manages predictions and calculates metrics using evaluation functions. We can access all the metrics from the TrainMetrics object using its to_dict() method.
A more detailed exploration of interpreting results will be undertaken in a later chapter.
[8]:
metrics_dict = metrics.to_dict()
max_key_length = max(len(str(k)) for k in metrics_dict.keys())
for k, v in metrics_dict.items():
print(f"{k:{max_key_length}} : {v}")
train_loss : 2.3299765326192716
val_loss : 7.457184716395309
train_accuracy : 0.849365234375
train_f1 : 0.8493114627262905
val_accuracy : 0.816905
val_f1 : 0.816258432667835
best_iteration : 35625
best_loss : 7.65961674023207
current_epoch : 20
current_iteration : 37500
epochs : 20
learning_rate : 0.001
total_steps : 37500
Conclusion
Thus, we have successfully built and tested a prototype model using the ablator. In the later chapters, we will explore deeper into hyperparameter optimization with more complex models.
Additional Info
Why training with ProtoTrainer?
It provides a robust way to handle errors during training.
Ideal for prototyping experiments in a local environment.
Easily adaptable for hyperparameter optimization with larger configurations and horizontal scaling.
Quick transition to
ParallelConfigandParallelTrainerfor parallel execution of trials using Ray.
How to visualize metrics
We can also visualize metrics on TensorBoard with respect to every epoch.
Just install
tensorboard. Load using%load_ext tensorboardif using notebook.Run the command
%tensorboard --logdir /tmp/experiments/[experiment_dir_name]/dashboard/tensorboard --port [port]