ablator.config package#

Submodules#

ablator.config.hpo module#

class ablator.config.hpo.FieldType(value)[source]#

Bases: Enum

Type of search space.

continuous = 'float'#

discrete = 'int'#

class ablator.config.hpo.SearchSpace(*args: Any, **kwargs: Any)[source]#

Bases: ConfigBase

Search space configuration, required in ParallelConfig, is used to define the search space for a hyperparameter. Its constructor takes as input keyword arguments that correspond to parameters defined in the Parameters section.

Parameters:

value_rangeOptional[Tuple[str, str]]: value range of the parameter.
categorical_valuesOptional[List[str]]: categorical values for the parameter.
subspacesOptional[List[Self]]: A list of search spaces,
sub_configurationOptional[SubConfiguration]: Subconfiguration for a SearchSpace.
value_typeFieldType: value type of the parameter’s values (continuous or discrete), by default FieldType.continuous.
n_binsOptional[int]: Total bins for grid sampling, optional.
logbool: To log, by default False.

Examples

In ablator, search space is defined for parallel ablation studies. For example, we want to run an ablation study on the model’s hidden size and activation function:

Given the following model configuration:

>>> @configclass
>>> class CustomModelConfig(ModelConfig):
>>>     hidden_size: int
>>>     activation: str
>>> my_model_config = CustomModelConfig(hidden_size=100, activation="relu")

The search space, which will be passed to ParallelConfig as a dictionary (notice how the key is expressed as model_config.<model-hyperparameter>), should look like this:

>>> search_space = {
...     "model_config.hidden_size": SearchSpace(value_range = [32, 64], value_type = 'int'),
...     "model_config.activation": SearchSpace(categorical_values = ["relu", "elu", "leakyRelu"])
... }

Attributes:

value_range: Optional[Tuple[str, str]]: Value range of the parameter.
categorical_values: Optional[List[str]]: Categorical values for the parameter.
subspaces: Optional[List[Self]]: A list of search spaces.
sub_configuration: Optional[SubConfiguration]: Subconfiguration for a SearchSpace.
value_type: FieldType = FieldType.continuous: Value type of the parameter’s values (continuous or discrete).
n_bins: Optional[int]: Total bins for grid sampling.
log: bool: To log, by default False.

categorical_values: List[str]#

config_class#: alias of SearchSpace

contains(value: float | int | str | dict[str, Any]) → bool[source]#

Check whether the value is in the search space.

Parameters:

valuefloat | int | str | dict[str, ty.Any]: value to search

Returns:

bool: whether searchspace contains the value

Raises:

ValueError: Raised if value is not of specified types.

log: bool = False#

make_dict(annotations: dict[str, ablator.config.types.Annotation], ignore_stateless: bool = False, flatten: bool = False) → dict[source]#

Create a dictionary representation of the configuration object.

Parameters:

annotationsdict[str, Annotation]: A dictionary of annotations.
ignore_statelessbool: Whether to ignore stateless values, by default False.
flattenbool: Whether to flatten nested dictionaries, by default False.

Returns:

dict: The dictionary representation of the configuration object.

Raises:

NotImplementedError: If the type of annot.collection is not supported.

make_paths() → list[str][source]#

n_bins: int#

parsed_value_range() → tuple[int, int] | tuple[float, float][source]#

Extract the lower and upper bound in the search space, values are cast to int or float.

Returns:

tuple[int, int] | tuple[float, float]: tuple representing the range of SearchSpace’s value_range.

Examples

>>> ss = SearchSpace(value_range=[0.05, 0.1], value_type="float")
>>> range = ss.parsed_value_range()
>>> range
(0.05, 0.1)

sub_configuration: SubConfiguration#

subspaces: List[Self]#

value_range: Tuple[str, str]#

value_type: FieldType#

class ablator.config.hpo.SubConfiguration(**kwargs: Any)[source]#

Bases: object

Subconfiguration for a SearchSpace. As the name suggests, its arguments typically correspond to the attributes of the main config classs that we’re creating SearchSpace for. For example, if the main config class is OptimizerConfig, keys to the sub_configuration object should be name, and arguments. Refer to the example for more details on how to use it.

Parameters:

**kwargs: ty.Any: Keyword arguments for the subconfiguration, which typically correspond to the attributes of the main config classs that we’re creating SearchSpace for. You can also create extra search spaces for any of the arguments.

Examples

The below example defines optimizer config as a search space of 2 subspaces: an SGD optimizer and an adam optimizer with a learning rate coming from a search space.

>>> search_space = {
...     "train_config.optimizer_config": SearchSpace(
...         subspaces=[
...             {"sub_configuration": {"name": "sgd", "arguments": {"lr": 0.1}}},
...             {"sub_configuration": {
...                 "name": "adam",
...                 "arguments": {
...                     "lr": {"value_range": (0, 1), "value_type": "float"},
...                     "weight_decay": 0.9,
...                 },
...             }}
...         ]
...     )
... }

Note that the keys for "sub_configuration" comes from the constructor arguments of the optimizer_config class, which in ablator is OptimizerConfig, which are "name" and "arguments".

Attributes:

arguments: dict[str, ty.Any]: arguments for the subconfigurations.

contains(value: dict[str, Any]) → bool[source]#

ablator.config.main module#

class ablator.config.main.ConfigBase(*args: Any, debug: bool = False, **kwargs: Any)[source]#

Bases: object

This class is the building block for all configuration objects within ablator. It serves as the base class for configurations such as ModelConfig, TrainConfig, OptimizerConfig, and more. Together with @configclass, it allows for the creation of config classes of customized attributes without the need to define a constructor. ConfigBase and @configclass take care of the initialization and parsing of the attributes. The example section below shows this in more detail.

In summary, to customize configurations for specific needs, you can create your own configuration class by inheriting it from ConfigBase. It’s essential to annotate it with @configclass. In the tutorial Search space for different types of optimizers and scheduler, a custom optimizer config class is created to enable ablation study on various optimizers and schedulers. You can refer to this tutorial for a realistic example of how to create your custom configuration class.

Note

One key takeaway is that when initializing a config object, you can look into the list of attributes defined in the config class to see what arguments you can pass.

Parameters:

*argsAny: This argument is just for disabling passing by positional arguments.
debugbool, optional: Whether to load the configuration in debug mode and ignore discrepancies/errors, by default False.
**kwargsAny: Keyword arguments. Possible arguments are from the annotations of the configuration class. You can look into the Examples section for more details.

Raises:

ValueError: If positional arguments are provided or there are missing required values.
KeyError: If unexpected arguments are provided.
RuntimeError: If the class is not decorated with @configclass.

Note

All config classes must be decorated with @configclass.

Examples

>>> @configclass
>>> class MyCustomConfig(ConfigBase):
...     attr1: int = 1
...     attr2: Tuple[str, int, str]
>>> my_config = MyCustomConfig(attr1=4, attr2=("hello", 1, "world"))  # Pass by named arguments
>>> kwargs = {"attr1": 4, "attr2": ("hello", 1, "world")}   # Pass by keyword arguments
>>> my_config = MyCustomConfig(**kwargs)

Note that since we defined MyCustomConfig as a config class with two annotated attributes attr1 and attr2 (without a constructor, which is automatically handled by ConfigBase and @configclass), when creating the config object, you can directly pass attr1 and attr2. You can also pass these arguments as keyword arguments.

Attributes:

config_classType: The class of the configuration object.

property annotations: dict[str, ablator.config.types.Annotation]#

Get the parsed annotations of the configuration object.

Returns:

dict[str, Annotation]: A dictionary of parsed annotations.

assert_unambigious()[source]#

Assert that the configuration object is unambiguous and has all the required values.

Raises:

AssertionError: If the configuration object is ambiguous or missing required values.

config_class#: alias of None

diff(config: ConfigBase, ignore_stateless: bool = False) → list[tuple[str, tuple[type, Any], tuple[type, Any]]][source]#

Get the differences between the current configuration object and another configuration object.

Parameters:

configConfigBase: The configuration object to compare.
ignore_statelessbool: Whether to ignore stateless values, by default False

Returns:

list[tuple[str, tuple[type, Any], tuple[type, Any]]]: The list of differences as tuples.

Examples

Let’s say we have two configuration objects config1 and config2 with the following attributes:

>>> config1:
    learning_rate: 0.01
    optimizer: 'Adam'
    num_layers: 3

>>> config2:
    learning_rate: 0.02
    optimizer: 'SGD'
    num_layers: 3

The diff between these two configurations would look like:

>>> config1.diff(config2)
[('learning_rate', (float, 0.01), (float, 0.02)), ('optimizer', (str, 'Adam'), (str, 'SGD'))]

In this example, the learning_rate and optimizer values are different between the two configuration objects.

diff_str(config: ConfigBase, ignore_stateless: bool = False) → list[str][source]#

Get the differences between the current configuration object and another configuration object as strings.

Parameters:

configConfigBase: The configuration object to compare.
ignore_statelessbool: Whether to ignore stateless values, by default False.

Returns:

list[str]: The list of differences as strings.

freeze()[source]#

get_annot_type_with_dot_path(dot_path: str) → type[source]#

Get the type of a configuration object annotation using dot notation.

Parameters:

dot_pathstr: The dot notation path to the annotation.

Returns:

Type: The type of the annotation.

get_type_with_dot_path(dot_path: str) → type[source]#

Get the type of a configuration object attribute using dot notation.

Parameters:

dot_pathstr: The dot notation path to the attribute.

Returns:

Type: The type of the attribute.

get_val_with_dot_path(dot_path: str) → Any[source]#

Get the value of a configuration object attribute using dot notation.

Parameters:

dot_pathstr: The dot notation path to the attribute.

Returns:

Any: The value of the attribute.

keys() → KeysView[str][source]#

Get the keys of the configuration dictionary.

Returns:

abc.KeysView[str]: The keys of the configuration dictionary.

classmethod load(path: Path | str, debug: bool = False) → Self[source]#

Load a configuration object from a file.

Parameters:

pathUnion[Path, str]: The path to the configuration file.
debugbool, optional: Whether to load the configuration in debug mode, and ignore discrepancies/errors, by default False.

Returns:

Self: The loaded configuration object.

make_dict(annotations: dict[str, ablator.config.types.Annotation], ignore_stateless: bool = False, flatten: bool = False) → dict[source]#

Create a dictionary representation of the configuration object.

Parameters:

annotationsdict[str, Annotation]: A dictionary of annotations.
ignore_statelessbool: Whether to ignore stateless values, by default False.
flattenbool: Whether to flatten nested dictionaries, by default False.

Returns:

dict: The dictionary representation of the configuration object.

Raises:

NotImplementedError: If the type of annot.collection is not supported.

to_dict(ignore_stateless: bool = False) → dict[source]#

Convert the configuration object to a dictionary.

Parameters:

ignore_statelessbool: Whether to ignore stateless values, by default False.

Returns:

dict: The dictionary representation of the configuration object.

to_dot_path(ignore_stateless: bool = False) → str[source]#

Convert the configuration object to a dictionary with dot notation paths as keys.

Parameters:

ignore_statelessbool: Whether to ignore stateless values, by default False.

Returns:

str: The YAML representation of the configuration object in dot notation paths.

to_yaml() → str[source]#

Convert the configuration object to YAML format.

Returns:

str: The YAML representation of the configuration object.

property uid: str#

Get the unique identifier for the configuration object.

Returns:

str: The unique identifier for the configuration object.

write(path: Path | str)[source]#

Write the configuration object to a file.

Parameters:

pathUnion[Path, str]: The path to the file.

class ablator.config.main.Missing[source]#

Bases: object

This type is defined only for raising an error

ablator.config.main.configclass(cls: type['ConfigBase']) → type['ConfigBase'][source]#

Decorator for ConfigBase subclasses, adds the config_class attribute to the class.

Parameters:

clstype[“ConfigBase”]: The class to be decorated.

Returns:

type[ConfigBase]: The decorated class with the config_class attribute.

ablator.config.mp module#

class ablator.config.mp.ParallelConfig(*args: Any, debug: bool = False, **kwargs: Any)[source]#

Bases: RunConfig

Parallel training configuration, extending from RunConfig, defines the settings of a parallel experiment (number of trials to run for, number of concurrent trials, search space for hyperparameter search, etc.).

ParallelConfig encapsulates every configuration (model config, optimizer-scheduler config, train config, and the search space) needed to run a parallel experiment. The entire umbrella of configuration is then passed to ParallelTrainer which launches the experiment.

Examples

There are several steps before defining a parallel run config, let’s go through them one by one:

Define training config:

>>> my_optim_config = OptimizerConfig("sgd", {"lr": 0.5, "weight_decay": 0.5})
>>> my_scheduler_config = SchedulerConfig("step", arguments={"step_size": 1, "gamma": 0.99})
>>> train_config = TrainConfig(
...     dataset="[Dataset Name]",
...     batch_size=32,
...     epochs=10,
...     optimizer_config = my_optimizer_config,
...     scheduler_config = my_scheduler_config
... )

Define model config, we want to run HPO on activation functions and model hidden size:

>>> @configclass
>>> class CustomModelConfig(ModelConfig):
>>>     hidden_size: int
>>>     activation: str
>>> model_config = CustomModelConfig(hidden_size=100, activation="relu")

Define search space:

>>> search_space = {
...     "train_config.optimizer_config.arguments.lr": SearchSpace(
...         value_range=[0.001, 0.01], value_type="float"
...     ),
...     "model_config.hidden_size": SearchSpace(value_range=[32, 64], value_type="int"),
...     "model_config.activation": SearchSpace(
...         categorical_values=["relu", "elu", "leakyRelu"]
...     ),
... }

Lastly, we will define the run config from the previous config components (remember to redefine the parallel config to update the model config type to be CustomModelConfig):

>>> @configclass
>>> class CustomParallelConfig(ParallelConfig):
...    model_config: CustomModelConfig
>>> parallel_config = CustomParallelConfig(
...     train_config=train_config,
...     model_config=model_config,
...     metrics_n_batches = 800,
...     experiment_dir = "/tmp/experiments/",
...     device="cuda",
...     amp=True,
...     random_seed = 42,
...     total_trials = 20,
...     concurrent_trials = 20,
...     search_space = search_space,
...     optim_metrics = {"val_loss": "min"},
...     optim_metric_name = "val_loss",
...     gpu_mb_per_experiment = 1024
... )

Attributes:

total_trials: Optional[int]: total number of trials.
concurrent_trials: int: number of trials to run concurrently.
search_space: Dict[SearchSpace]: search space for hyperparameter search, eg. {"train_config.optimizer_config.arguments.lr": SearchSpace(value_range=[0, 10], value_type="int"),}
gpu_mb_per_experiment: int: CUDA memory requirement per experimental trial in MB. e.g. a value of 100 is equivalent to 100MB
search_algo: SearchAlgo = SearchAlgo.tpe: type of search algorithm.
ignore_invalid_params: bool = False: whether to ignore invalid parameters when sampling or raise an error.
remote_config: Optional[RemoteConfig] = None: remote storage configuration.

concurrent_trials: Stateless[int]#

config_class#: alias of ParallelConfig

gpu_mb_per_experiment: Stateless[int] = None#

ignore_invalid_params: Stateless[bool] = False#

remote_config: Stateless[RemoteConfig] = None#

search_algo: Stateless[SearchAlgo] = 'random'#

search_space: Dict[SearchSpace]#

total_trials: int#

class ablator.config.mp.SearchAlgo(value)[source]#

Bases: Enum

Type of search algorithm.

Grid Sampling: Discretizes the search space into even intervals n_bins.
TPE Sampling: Tree-Structured Parzen Estimator [1] is a hyper-parameter optimization algorithm.
Random Sampling: Naively samples from the search space with a random probability.

The behavior of each algorithm depends highly on the budget allocated for each trial. For example, Grid Sampling will repeat sampled configurations only after it has exhaustively evaluated the current configuration space.

TPE and Random Sampling can repeat configurations at random.

References: [1] Bergstra, James S., et al. “Algorithms for hyper-parameter optimization.” Advances in Neural Information Processing Systems. 2011.

grid = 'grid'#

random = 'random'#

tpe = 'tpe'#

ablator.config.proto module#

class ablator.config.proto.ModelConfig(*args: Any, debug: bool = False, **kwargs: Any)[source]#

Bases: ConfigBase

A base class for model configuration. This is used for defining model hyperparameters, so when initializing a model, it is passed to the model module constructor. The attributes from the model config object will be used to construct the model.

Examples

Define a custom model configuration class for your model:

>>> @configclass
>>> class CustomModelConfig(ModelConfig):
>>>     input_size :int
>>>     hidden_size :int
>>>     num_classes :int

Define your model class, pass the configuration to the constructor, and build the model:

>>> class FashionMNISTModel(nn.Module):
>>>     def __init__(self, config: CustomModelConfig):
>>>         super(FashionMNISTModel, self).__init__()
>>>         self.fc1 = nn.Linear(config.input_size, config.hidden_size) # model config attributes are used here
>>>         self.relu1 = nn.ReLU()
>>>         self.fc3 = nn.Linear(config.hidden_size, config.num_classes) # model config attributes are used here
>>>     def forward(self, x):
>>>         # code for forward pass
>>>         return x

RunConfig later requires a model config object, so we will create one, remember to pass values to the hyperparameters as we defined them to be Stateful:

>>> model_config = CustomModelConfig(input_size=512, hidden_size=100, num_classes=10)

config_class#: alias of ModelConfig

class ablator.config.proto.Optim(value)[source]#

Bases: Enum

Type of optimization direction.

can take values min and max that indicate whether the HPO algorithm should minimize or maximize the corresponding metric.

max = 'max'#

min = 'min'#

class ablator.config.proto.RunConfig(*args: Any, debug: bool = False, **kwargs: Any)[source]#

Bases: ConfigBase

The base run configuration that defines the setting of an experiment (experiment main directory, number of checkpoints to maintain, hardware device to use, etc.). You can use this to configure the experiment of a single prototype model.

RunConfig encapsulates every configuration (model config, optimizer-scheduler config, train config) needed for an experiment. This entire umbrella of configurations is then passed to ProtoTrainer which launches the prototype experiment.

Examples

There are several steps before defining a run config, let’s go through them one by one:

Define training config:

>>> my_optimizer_config = OptimizerConfig("sgd", {"lr": 0.5, "weight_decay": 0.5})
>>> my_scheduler_config = SchedulerConfig("step", arguments={"step_size": 1, "gamma": 0.99})
>>> train_config = TrainConfig(
...     dataset="[Dataset Name]",
...     batch_size=32,
...     epochs=10,
...     optimizer_config = my_optimizer_config,
...     scheduler_config = my_scheduler_config,
...     rand_weights_init = True
... )

Define model config, here we use default one with no custom hyperparameters (sometimes you would want to customize the model config to run HPO on your model’s hyperparameters in the parallel experiments with `ParallelTrainer`, which requires `ParallelConfig` instead of `RunConfig`):

>>> model_config = ModelConfig()

Lastly, we will create the run config, which has train config and model config as parameters:

>>> run_config = RunConfig(
...     train_config=train_config,
...     model_config=model_config,
...     metrics_n_batches = 800,
...     experiment_dir = "/tmp/experiments",
...     device="cpu",
...     amp=False,
...     random_seed = 42
... )

Attributes:

experiment_dir: Stateless[Optional[str]]: Location to store experiment artifacts, by default None.
random_seed: Optional[int]: Random seed, by default None.
train_config: TrainConfig: Training configuration.
model_config: ModelConfig: Model configuration.
keep_n_checkpoints: Stateless[int]: Number of latest checkpoints to keep, by default 3.
tensorboard: Stateless[bool]: Whether to use tensorboardLogger, by default True.
amp: Stateless[bool]: Whether to use automatic mixed precision when running on gpu, by default True.
device: Stateless[str]: Device to run on, by default "cuda".
verbose: Stateless[Literal[“console”, “progress”, “silent”]]: Verbosity level, by default "console".
eval_subsample: Stateless[float]: Fraction of the dataset to use for evaluation, by default 1.
metrics_n_batches: Stateless[int]: Max number of batches stored in every tag(train, eval, test) for evaluation, by default 32.
metrics_mb_limit: Stateless[int]: Max number of megabytes stored in every tag(train, eval, test) for evaluation, by default 10_000 # 10GB.
early_stopping_iter: Stateless[Optional[int]]: The maximum allowed difference between the current iteration and the last iteration with the best metric before applying early stopping. Early stopping will be triggered if the difference (current_itr - best_itr) exceeds early_stopping_iter. If set to None, early stopping will not be applied. By default None.
eval_epoch: Stateless[float]: The epoch interval between two evaluations, by default 1.
log_epoch: Stateless[float]: The epoch interval between two logging, by default 1.
init_chkpt: Stateless[Optional[str]]: Path to a checkpoint to initialize the model with, by default None.
warm_up_epochs: Stateless[float]: Number of epochs marked as warm up epochs, by default 1.
divergence_factor: Stateless[Optional[float]]: If cur_loss > best_metric > divergence_factor, the model is considered to have diverged, by default 10.
optim_metrics: Stateless[Optional[Dict[Optim]]]: The optimization metric to use for meta-training procedures, such as for model saving and lr scheduling.
optim_metric_name: Stateless[Optional[str]]: The name of the metric to be optimized.

amp: Stateless[bool] = True#

config_class#: alias of RunConfig

device: Stateless[str] = 'cuda'#

divergence_factor: Stateless[float] = 10#

early_stopping_iter: Stateless[int] = None#

eval_epoch: Stateless[float] = 1#

eval_subsample: Stateless[float] = 1#

experiment_dir: Stateless[str] = None#

init_chkpt: Stateless[str] = None#

keep_n_checkpoints: Stateless[int] = 3#

log_epoch: Stateless[float] = 1#

metrics_mb_limit: Stateless[int] = 10000#

metrics_n_batches: Stateless[int] = 32#

model_config: ModelConfig#

optim_metric_name: Stateless[str]#

optim_metrics: Stateless[Dict[Optim]]#

random_seed: int = None#

tensorboard: Stateless[bool] = True#

train_config: TrainConfig#

property uid: str#

Get the unique identifier for the configuration object.

Returns:

str: The unique identifier for the configuration object.

verbose: Stateless[Literal['console', 'progress', 'silent']] = 'console'#

warm_up_epochs: Stateless[float] = 1#

class ablator.config.proto.TrainConfig(*args: Any, debug: bool = False, **kwargs: Any)[source]#

Bases: ConfigBase

Training configuration that defines the training setting, e.g., batch size, number of epochs, the optimizer to use, etc. This configuration is required when creating the run configurations (RunConfig and ParallelConfig, which set up the running environment of the experiment).

Examples

The following example shows all the steps towards configuring an experiment:

Define model config: for simplicity, we use the default one with no custom hyperparameters (so we’re not running an ablation study on the model architecture):

>>> my_model_config = ModelConfig()

Define optimizer and scheduler config, as training config requires an optimizer config, and optionally a scheduler config:

>>> my_optimizer_config = OptimizerConfig("sgd", {"lr": 0.5, "weight_decay": 0.5})
>>> my_scheduler_config = SchedulerConfig("step", arguments={"step_size": 1, "gamma": 0.99})

Define training config:

>>> my_train_config = TrainConfig(
...     dataset="[Your Dataset]",
...     batch_size=32,
...     epochs=10,
...     optimizer_config = my_optimizer_config,
...     scheduler_config = my_scheduler_config
... )

We now define the run config for prototype training, which is the last configuration step. Refer to Configurations for single model experiments and Configurations for parallel models experiments for more details on running configs.

>>> run_config = RunConfig(
...     train_config=my_train_config,
...     model_config=my_model_config,
...     metrics_n_batches = 800,
...     experiment_dir = "/tmp/experiments",
...     device="cpu",
...     amp=False,
...     random_seed = 42
... )

Attributes:

dataset: str: Dataset name. maybe used in custom dataset loader functions.
batch_size: int: Batch size.
epochs: int: Number of epochs to train.
optimizer_config: OptimizerConfig: Optimizer configuration.
scheduler_config: Optional[SchedulerConfig]: Scheduler configuration.

batch_size: int#

config_class#: alias of TrainConfig

dataset: str#

epochs: int#

optimizer_config: OptimizerConfig#

scheduler_config: SchedulerConfig#

ablator.config.types module#

Custom types for runtime checking

class ablator.config.types.Annotation(state, optional, collection, variable_type)#

Bases: tuple

collection#: Alias for field number 2

optional#: Alias for field number 1

state#: Alias for field number 0

variable_type#: Alias for field number 3

class ablator.config.types.Derived[source]#

Bases: Generic[T]

Derived is used for attributes that are derived during the experiment (after launching the experiment trainer.launch()). To make an attribute derived, wrap Derived around its type definition, e.g Derived[List[int]], Derived[str].

Examples

For example, you want to test how different pre-trained word embeddings (e.g word2vec 100d, word2vec 300d) affect the performance of a classification model, and you will use ablator to run ablation study on the effect of word embeddings. Plus, the classification model architecture depends on the size of the embedding length of each pre-trained set of word embeddings. In this case, the model architecture is derived from the pre-trained word embeddings. So you can define a model config class as follows:

>>> @configclass
>>> class MyModelConfig(ModelConfig):
>>>     embed_dim: Derived[int]

Then you can define a model class that takes in the model config as input and set input length using embed_dim:

>>> class MyModel(nn.Module):
>>>     def __init__(self, config: MyModelConfig):
>>>         super().__init__()
>>>         self.embed_dim = config.embed_dim

Finally, config_parser is used to set the value of the Derived attribute embed_dim based on the pre-trained word embeddings:

>>> class MyLMWrapper(ModelWrapper):
>>>     def config_parser(self, run_config: RunConfig):
>>>         run_config.model_config.embed_dim = len(self.train_dataloader.word2vec.wv.vocab)
>>>         return run_config

Note

When initializing config objects, you do not have to assign values to attributes that are of Derived type.

class ablator.config.types.Dict[source]#

Bases: Dict[str, T]

A class for dictionary data type, with keys as strings. Used when you need to specify a config attribute as a dictionary (in fact, ablator defines search_space as a dictionary of SearchSpace in config class ParallelConfig). Remember to wrap the type of the dictionary elements in Dict[], e.g Dict[str] is a dictionary which has string values, Dict[int] is a dictionary which has integer values.

Examples

You can declare an attribute of type Dict as follows:

>>> @configclass
>>> class MyConfig(ConfigBase):
>>>     my_str_dict: Dict[str]
>>>     my_int_dict: Dict[int]
>>>     my_space_dict: Dict[SearchSpace]

When initializing a config object, you can pass a dictionary with keys as strings. For values, ablator will automatically cast them to the correct type if possible. For example:

>>> str_dict = {"str1": "val1", "str2": 2}
>>> int_dict = {"int1": 1, "int2": 2.5}
>>> space_dict = {"space1": SearchSpace(value_range = [0, 10], value_type = 'int')}
>>> MyConfig(my_str_dict=str_dict, my_int_dict=int_dict, my_space_dict=space_dict)
MyConfig(
    my_str_dict={'str1': 'val1', 'str2': '2'},
    my_int_dict={'int1': 1, 'int2': 2},
    my_space_dict={
        'space1': {
            'value_range': ('0', '10'),
            'categorical_values': None,
            'subspaces': None,
            'sub_configuration': None,
            'value_type': 'int',
            'n_bins': None,
            'log': False
        }
    }
)

Notice that the value at key str2 is cast to a string, and the value at key int2 is cast to an integer.

class ablator.config.types.Enum(value)[source]#

Bases: Enum

A custom Enum class that provides additional equality and hashing methods. This is useful when creating custom data types that take as value elements from a fixed set. In ablator, we use Enum to define Optim, which specifies the optimization direction: Optim.min or Optim.max. Optim is then used in config class RunConfig (optim_metrics attribute).

Examples

Create a custom Enum class by inheriting from Enum:

>>> from ablator import Enum
>>> class Color(Enum):
>>>     RED = 1
>>>     GREEN = 2
>>>     BLUE = 3

RED, GREEN, and BLUE are fixed value set for Color type. Internally, these values are mapped to integers 1, 2, and 3. The custom data type Color can now be used in config classes:

>>> @configclass
>>> class MyConfig(ConfigBase):
>>>     my_color: Color
>>> MyConfig(my_color=Color.RED)
MyConfig(my_color=1)

class ablator.config.types.List(iterable=(), /)[source]#

Bases: List[T]

A class for list data type, used when you need to annotate an attribute as a list. Remember to wrap the type of the list elements in List[], e.g. List[str], List[int].

Examples

You can declare an attribute of type List as follows:

>>> @configclass
>>> class MyConfig(ConfigBase):
>>>     my_str_list: List[str]  # list of strings
>>>     my_int_list: List[int]  # list of integers

When initializing a config object, you can pass a list of proper values. In addition, ablator will automatically cast them to the correct type if possible. For example:

>>> MyConfig(my_str_list=["a", "b", 1.5, 2],
...          my_int_list=[1, 2, -3.5, 4])
MyConfig(
    my_str_list=['a', 'b', '1.5', '2'],
    my_int_list=[1, 2, -3, 4]
)

Notice that the value of my_str_list[2] and my_int_list[3] are cast to string, and the value of my_int_list[2] is cast to an integer.

class ablator.config.types.Optional[source]#

Bases: Generic[T]

A class for optional data types. This is helpful when a config attribute is optional, meaning that we can leave an optional config attribute empty. (In fact, ablator defines scheduler_config as optional in the config class TrainConfig).

Examples

You can declare an attribute of type Optional as follows:

>>> @configclass
>>> class MyConfig(ConfigBase):
>>>     my_optional_list: Optional[List[str]]

When initializing a config object, you can pass a List[str] value to my_optional_list, or not passing values at all:

>>> MyConfig(my_optional_list=["a"])
MyConfig(my_optional_list=['a'])
>>> MyConfig()
MyConfig(my_optional_list=None)

class ablator.config.types.Self[source]#: Bases: object

class ablator.config.types.Stateful[source]#

Bases: Generic[T]

This is for attributes that are fixed between experiments. By default, we assume that primitive-typed attributes are stateful. Unlike Derived and Stateless, in which you have to annotate attributes with these classes, e.g. attr: Statess[int] or attr: Statess[List[str]], for stateful, just define them without Stateful, e.g attr: int or attr: List[str].

Examples

The below example defines a model config that has stateful embedding dimensions, which means that in every experiment, the embedding dimension must be the same.

>>> @configclass
>>> class MyModelConfig(ModelConfig):
>>>     embed_dim: int
>>> model_config = MyModelConfig(embed_dim=100)  # Must provide values for ``embed_dim`` before launching experiment

Note

In contrary to Derived, when initializing config objects (aka before launching the experiment trainer.launch()), you have to assign values to their stateful attributes.
Stateful is only applied in the context of experiments. So a stateful attribute must be the same between different runs of the same experiment configurations. However, within each experiment, a search space on stateful attributes can be defined to run HPO on them.

class ablator.config.types.Stateless[source]#

Bases: Generic[T]

This type is for attributes that can take different value assignments between experiments. To make an attribute stateless, wrap Stateless around its type definition, e.g Stateless[List[int]], Stateless[str].

Examples

>>> @configclass
>>> class MyModelConfig(ConfigBase):
>>>     attr: Stateless[List[int]]
>>> config = MyModelConfig(attr=[5,"6",7.25])  # Must provide values for ``attr`` before launching experiment

Note

Unlike Derived, when initializing config objects (before launching the experiment trainer.launch()) that have stateless attributes, you have to assign values to these attributes.

class ablator.config.types.Tuple(iterable=(), /)[source]#

Bases: Tuple[T]

A class for tuple data type, used when you need to annotate an attribute as a tuple. Remember to wrap the type of the tuple elements in Tuple[]. You also have the flexibility to specify the number of elements in the tuple and the data type for each of them.

Examples

You can declare an attribute of type Tuple as follows:

>>> @configclass
>>> class MyConfig(ConfigBase):
>>>     my_str_int_tuple: Tuple[str, int]   # Tuple of a string and an integer
>>>     my_2str_int_tuple: Tuple[str, int, str]   # Tuple of a string, an integer, and a string

When initializing a config object, you can pass a tuple of proper values. In addition, ablator will automatically cast them to the correct type if possible. For example:

>>> MyConfig(my_str_int_tuple=("a", 1.5), my_2str_int_tuple=("a", 1, 2))
MyConfig(
    my_str_int_tuple=('a', 1),
    my_2str_int_tuple=('a', 1, '2')
)

Notice how data are cast in my_str_int_tuple[1] and my_2str_int_tuple[2].

Note

The number of elements and their order in the tuple must match those types specified in Tuple[]. So for the example above, my_str_int_tuple must have exactly 2 elements in that order, and my_2str_int_tuple must have exactly 3 elements in that order.

ablator.config.types.parse_type_hint(cls: Any, type_hint: type[Any]) → Annotation[source]#

Parses a type hint and returns a parsed annotation.

Parameters:

clsty.Any: The class being annotated.
type_hinttype[ty.Any]: The input type hint to parse.

Returns:

Annotation: A namedtuple containing state, optional, collection, and variable_type information.

Examples

>>> parse_type_hint(Optional[List[int]])
Annotation(state=Stateful, optional=True, collection=List, variable_type=int)

ablator.config.types.parse_value(val: Any, annot: Annotation, name: str | None = None, debug: bool = False) → Any[source]#

Parses a value based on the given annotation.

Parameters:

valty.Any: The input value to parse.
annotAnnotation: The annotation namedtuple to guide the parsing.
namestr | None: The name of the value, by default None.
debugbool, optional: Whether to load the configuration in debug mode, and ignore discrepencies / errors, by default False.

Returns:

ty.Any: The parsed value.

Raises:

RuntimeError: If the required value is missing and it is not optional or derived or stateless.
ValueError: If the value type in dict is not valid If the value of a list is no valid

Examples

>>> annotation = parse_type_hint(Optional[List[int]])
>>> parse_value([1, 2, 3], annotation)
[1, 2, 3]

ablator.config.utils module#

ablator.config.utils.dict_hash(*dictionaries: list[dict[str, Any]] | dict[str, Any], hash_len: int = 4) → str[source]#

Calculates the MD5 hash of one or more dictionaries.

Parameters:

*dictionarieslist[dict[str, ty.Any]] | dict[str, ty.Any]: One or more dictionaries to calculate the hash for.
hash_lenint: The length of the hash to return, by default 4.

Returns:

str: The MD5 hash of the dictionaries.

Examples

>>> dict1 = {"a": 1, "b": 2}
>>> dict2 = {"c": 3, "d": 4}
>>> dict_hash(dict1, dict2)
'6d75e6'

ablator.config.utils.flatten_nested_dict(dict_: dict, expand_list: bool = True, seperator: str = '.') → dict[str, Any][source]#

Flattens a nested dictionary, expanding lists and tuples if specified.

Parameters:

dict_dict: The input dictionary to be flattened.
expand_listbool: Whether to expand lists and tuples in the dictionary, by default True.
seperatorstr: The separator used for joining the keys, by default ".".

Returns:

dict[str, ty.Any]: The flattened dictionary.

Examples

>>> nested_dict = {"a": {"b": 1, "c": {"d": 2}}, "e": [3, 4]}
>>> flatten_nested_dict(nested_dict)
{'a.b': 1, 'a.c.d': 2, 'e.0': 3, 'e.1': 4}

ablator.config.utils.parse_repr_to_kwargs(obj: Any) → tuple[tuple, dict[str, int | float | str | bool | None]][source]#

parse a string or dictionary representation to obtain the initialization arguments of the same object. It first attempts to do that via user-implemented to_dict, as_dict and __dict__ methods and when it fails it results to evaluating the string representation e.g. eval(str(obj)). If all fails… it raises an error.

NOTE the object obj must have the equality operator implemented __eq__, ideally a user implemented to_dict.

Parameters:

objty.Any: The object to deconstruct.

Returns:

tuple[tuple, dict[str, int | float | str | bool | None]]: a tuple of (args, kwargs) to reconstruct obj from above.

Raises:

RuntimeError: is raised when it is unable to obtain a representation that can reconstruct the original object. The reconstruction is evaluated by the equality operator.

ablator.config package#

Submodules#

ablator.config.hpo module#

ablator.config.main module#

ablator.config.mp module#

ablator.config.proto module#

ablator.config.types module#

ablator.config.utils module#

Module contents#