Data Type Configuration#
These data type classes are used in configuration classes to specify data type of each config attribute,
which provides ablator with the flexibility to expand into various configuration formats.
Common data types#
Ablator supports common structural data types like list, dictionary, etc. You can use these to annotate configuration attributes. Details of each data type can be found in the following sections:
- class ablator.config.types.List[source]
A class for list data type, used when you need to annotate an attribute as a list. Remember to wrap the type of the list elements in
List[], e.g.List[str],List[int].Examples
You can declare an attribute of type
Listas follows:>>> @configclass >>> class MyConfig(ConfigBase): >>> my_str_list: List[str] # list of strings >>> my_int_list: List[int] # list of integers
When initializing a config object, you can pass a list of proper values. In addition, ablator will automatically cast them to the correct type if possible. For example:
>>> MyConfig(my_str_list=["a", "b", 1.5, 2], ... my_int_list=[1, 2, -3.5, 4]) MyConfig( my_str_list=['a', 'b', '1.5', '2'], my_int_list=[1, 2, -3, 4] )
Notice that the value of
my_str_list[2]andmy_int_list[3]are cast to string, and the value ofmy_int_list[2]is cast to an integer.
- class ablator.config.types.Tuple[source]
A class for tuple data type, used when you need to annotate an attribute as a tuple. Remember to wrap the type of the tuple elements in
Tuple[]. You also have the flexibility to specify the number of elements in the tuple and the data type for each of them.Examples
You can declare an attribute of type
Tupleas follows:>>> @configclass >>> class MyConfig(ConfigBase): >>> my_str_int_tuple: Tuple[str, int] # Tuple of a string and an integer >>> my_2str_int_tuple: Tuple[str, int, str] # Tuple of a string, an integer, and a string
When initializing a config object, you can pass a tuple of proper values. In addition, ablator will automatically cast them to the correct type if possible. For example:
>>> MyConfig(my_str_int_tuple=("a", 1.5), my_2str_int_tuple=("a", 1, 2)) MyConfig( my_str_int_tuple=('a', 1), my_2str_int_tuple=('a', 1, '2') )
Notice how data are cast in
my_str_int_tuple[1]andmy_2str_int_tuple[2].Note
The number of elements and their order in the tuple must match those types specified in
Tuple[]. So for the example above,my_str_int_tuplemust have exactly 2 elements in that order, andmy_2str_int_tuplemust have exactly 3 elements in that order.
- class ablator.config.types.Dict[source]
A class for dictionary data type, with keys as strings. Used when you need to specify a config attribute as a dictionary (in fact, ablator defines
search_spaceas a dictionary ofSearchSpacein config classParallelConfig). Remember to wrap the type of the dictionary elements inDict[], e.gDict[str]is a dictionary which has string values,Dict[int]is a dictionary which has integer values.Examples
You can declare an attribute of type
Dictas follows:>>> @configclass >>> class MyConfig(ConfigBase): >>> my_str_dict: Dict[str] >>> my_int_dict: Dict[int] >>> my_space_dict: Dict[SearchSpace]
When initializing a config object, you can pass a dictionary with keys as strings. For values, ablator will automatically cast them to the correct type if possible. For example:
>>> str_dict = {"str1": "val1", "str2": 2} >>> int_dict = {"int1": 1, "int2": 2.5} >>> space_dict = {"space1": SearchSpace(value_range = [0, 10], value_type = 'int')} >>> MyConfig(my_str_dict=str_dict, my_int_dict=int_dict, my_space_dict=space_dict) MyConfig( my_str_dict={'str1': 'val1', 'str2': '2'}, my_int_dict={'int1': 1, 'int2': 2}, my_space_dict={ 'space1': { 'value_range': ('0', '10'), 'categorical_values': None, 'subspaces': None, 'sub_configuration': None, 'value_type': 'int', 'n_bins': None, 'log': False } } )
Notice that the value at key
str2is cast to a string, and the value at keyint2is cast to an integer.
- class ablator.config.types.Optional[source]
A class for optional data types. This is helpful when a config attribute is optional, meaning that we can leave an optional config attribute empty. (In fact, ablator defines
scheduler_configas optional in the config classTrainConfig).Examples
You can declare an attribute of type
Optionalas follows:>>> @configclass >>> class MyConfig(ConfigBase): >>> my_optional_list: Optional[List[str]]
When initializing a config object, you can pass a
List[str]value tomy_optional_list, or not passing values at all:>>> MyConfig(my_optional_list=["a"]) MyConfig(my_optional_list=['a']) >>> MyConfig() MyConfig(my_optional_list=None)
- class ablator.config.types.Enum[source]
A custom Enum class that provides additional equality and hashing methods. This is useful when creating custom data types that take as value elements from a fixed set. In ablator, we use
Enumto defineOptim, which specifies the optimization direction:Optim.minorOptim.max.Optimis then used in config classRunConfig(optim_metricsattribute).Examples
Create a custom Enum class by inheriting from
Enum:>>> from ablator import Enum >>> class Color(Enum): >>> RED = 1 >>> GREEN = 2 >>> BLUE = 3
RED,GREEN, andBLUEare fixed value set for Color type. Internally, these values are mapped to integers 1, 2, and 3. The custom data typeColorcan now be used in config classes:>>> @configclass >>> class MyConfig(ConfigBase): >>> my_color: Color >>> MyConfig(my_color=Color.RED) MyConfig(my_color=1)
Ablator custom data types#
The next data classes are specific to ablator framework: Derived, Stateless, and Stateful.
Users have the option to wrap these around the common data types, python primitive type, or custom
classes to further modify their behavior. Configuration
Basics tutorial also discusses about these data types.
- class ablator.config.types.Stateless[source]
This type is for attributes that can take different value assignments between experiments. To make an attribute stateless, wrap
Statelessaround its type definition, e.gStateless[List[int]],Stateless[str].Examples
>>> @configclass >>> class MyModelConfig(ConfigBase): >>> attr: Stateless[List[int]] >>> config = MyModelConfig(attr=[5,"6",7.25]) # Must provide values for ``attr`` before launching experiment
Note
Unlike
Derived, when initializing config objects (before launching the experimenttrainer.launch()) that have stateless attributes, you have to assign values to these attributes.
- class ablator.config.types.Derived[source]
Derivedis used for attributes that are derived during the experiment (after launching the experimenttrainer.launch()). To make an attribute derived, wrapDerivedaround its type definition, e.gDerived[List[int]],Derived[str].Examples
For example, you want to test how different pre-trained word embeddings (e.g word2vec 100d, word2vec 300d) affect the performance of a classification model, and you will use ablator to run ablation study on the effect of word embeddings. Plus, the classification model architecture depends on the size of the embedding length of each pre-trained set of word embeddings. In this case, the model architecture is derived from the pre-trained word embeddings. So you can define a model config class as follows:
>>> @configclass >>> class MyModelConfig(ModelConfig): >>> embed_dim: Derived[int]
Then you can define a model class that takes in the model config as input and set input length using
embed_dim:>>> class MyModel(nn.Module): >>> def __init__(self, config: MyModelConfig): >>> super().__init__() >>> self.embed_dim = config.embed_dim
Finally,
config_parseris used to set the value of theDerivedattributeembed_dimbased on the pre-trained word embeddings:>>> class MyLMWrapper(ModelWrapper): >>> def config_parser(self, run_config: RunConfig): >>> run_config.model_config.embed_dim = len(self.train_dataloader.word2vec.wv.vocab) >>> return run_config
Note
When initializing config objects, you do not have to assign values to attributes that are of
Derivedtype.
- class ablator.config.types.Stateful[source]
This is for attributes that are fixed between experiments. By default, we assume that primitive-typed attributes are stateful. Unlike
DerivedandStateless, in which you have to annotate attributes with these classes, e.g.attr: Statess[int]orattr: Statess[List[str]], for stateful, just define them withoutStateful, e.gattr: intorattr: List[str].Examples
The below example defines a model config that has stateful embedding dimensions, which means that in every experiment, the embedding dimension must be the same.
>>> @configclass >>> class MyModelConfig(ModelConfig): >>> embed_dim: int >>> model_config = MyModelConfig(embed_dim=100) # Must provide values for ``embed_dim`` before launching experiment
Note
In contrary to
Derived, when initializing config objects (aka before launching the experimenttrainer.launch()), you have to assign values to their stateful attributes.Stateful is only applied in the context of experiments. So a stateful attribute must be the same between different runs of the same experiment configurations. However, within each experiment, a search space on stateful attributes can be defined to run HPO on them.