Welcome to MagNet’s documentation!

magnet

magnet.eval(*modules)[source]

A Context Manger that makes it easy to run computations in eval mode.

It sets modules in their eval mode and ensures that gradients are not computed.

This is a more wholesome option than torch.no_grad() since many Modules (BatchNorm, Dropout etc.) behave differently while training and testing.

Examples:

>>> import magnet as mag
>>> import magnet.nodes as mn
>>> import torch
>>> model = mn.Linear(10)
>>> x = torch.randn(4, 3)
>>> # Using eval() as context manager
>>> with mag.eval(model):
>>>     model(x)
>>> # Use as decorator
>>> @mag.eval(model)
>>> def foo():
>>>     return model(x)
>>> foo()
>>> # The modules can also be given at runtime by specifying no arguments
>>> @mag.eval
>>> def foo(model):
>>>     return model(x)
>>> foo()
>>> # The method then takes modules from the arguments
>>> # to the decorated function.

magnet.data

Data

class magnet.data.Data(train, val=None, test=None, val_split=0.2, **kwargs)[source]

A container which holds the Training, Validation and Test Sets and provides DataLoaders on call.

This is a convenient abstraction which is used downstream with the Trainer and various debuggers.

It works in tandem with the custom Dataset, DataLoader and Sampler sub-classes that MagNet defines.

Parameters:
  • train (Dataset) – The training set
  • val (Dataset) – The validation set. Default: None
  • test (Dataset) – The test set. Default: None
  • val_split (float) – The fraction of training data to hold out as validation if validation set is not given. Default: 0.2
Keyword Arguments:
 
  • num_workers (int) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. Default: 0
  • collate_fn (callable) – merges a list of samples to form a mini-batch Default: pack_collate()
  • pin_memory (bool) – If True, the data loader will copy tensors into CUDA pinned memory before returning them. Default: False
  • timeout (numeric) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. Default: 0
  • worker_init_fn (callable) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. Default: None
  • transforms (list or callable) – A list of transforms to be applied to each datapoint. Default: None
  • fetch_fn (callable) – A function which is applied to each datapoint before collating. Default: None
__call__(batch_size=1, shuffle=False, replace=False, probabilities=None, sample_space=None, mode='train')[source]

Returns a MagNet DataLoader that iterates over the dataset.

Parameters:
  • batch_size (int) – How many samples per batch to load. Default: 1
  • shuffle (bool) – Set to True to have the data reshuffled at every epoch. Default: False
  • replace (bool) – If True every datapoint can be resampled per epoch. Default: False
  • probabilities (list or numpy.ndarray) – An array of probabilities of drawing each member of the dataset. Default: None
  • sample_space (float or int or list) – The fraction / length / indices of the sample to draw from. Default: None
  • mode (str) – One of ['train', 'val', 'test']. Default: 'train'

Core Datasets

magnet.data.core.MNIST(val_split=0.2, path=None, **kwargs)[source]

The MNIST Dataset.

Parameters:
  • val_split (float) – The fraction of training data to hold out as validation if validation set is not given. Default: 0.2
  • path (pathlib.Path or str) – The path to save the dataset to. Default: Magnet Datapath
Keyword Arguments:
 

() – See Data for more details.

Transforms

magnet.data.transforms.augmented_image_transforms(d=0, t=0, s=0, sh=0, ph=0, pv=0, resample=2)[source]

Returns a list of augmented transforms to be applied to natural images.

Parameters:
  • d (sequence or float or int) – Range of degrees to select from. Default: 0
  • t (tuple) – Tuple of maximum absolute fraction for horizontal and vertical translations. Default: 0
  • s (tuple, optional) – Scaling factor interval. Default: 0
  • sh (sequence or float or int, optional) – Range of shear. Default: 0
  • ph (float) – The probability of flipping the image horizontally. Default: 0
  • pv (float) – The probability of flipping the image vertically. Default: 0
  • resample (int) – An optional resampling filter. Default: 2

See torchvision.transforms for more details.

magnet.data.transforms.image_transforms(augmentation=0, direction='horizontal')[source]

Returns a list of transforms to be applied to natural images.

Parameters:
  • augmentation (float) – The percentage of augmentation to be applied. Default: 0
  • direction (str) – The direction to flip the image at random. Default: 'horizontal'

Nodes

Node

class magnet.nodes.Node(*args, **kwargs)[source]

Abstract base class that defines MagNet’s Node implementation.

A Node is a ‘self-aware Module’. It can dynamically parametrize itself in runtime.

For instance, a Linear Node can infer the input features automatically when first called; a Conv Node can infer the dimensionality (1, 2, 3) of the input automatically.

MagNet’s Nodes strive to help the developer as much as possible by finding the right hyperparameter values automatically. Ideally, the developer shouldn’t need to define anything except the basic architecture and the inputs and outputs.

The arguments passed to the constructor are stored in a _args attribute as a dictionary.

This is later modified by the build() method which gets automatically called on the first forward pass.

Keyword Arguments:
 name (str) – Class Name
build(*args, **kwargs)[source]

Builds the Node. Ideally, should not be called manually.

When an unbuilt module is first called, this method gets invoked.

_mul_list(n)[source]

A useful overload of the * operator that can create similar copies of the node.

Parameters:n (tuple or list) –

The modifier n should be used to change the arguments of the node in a meaningful way.

For instance, in the case of a Linear node, the items in n can be interpreted as the output dimensions of each layer.

Core

class magnet.nodes.Lambda(fn, **kwargs)[source]

Wraps a Node around any function.

Parameters:fn (callable) – The function which gets called in the forward pass

Examples:

>>> import magnet.nodes as mn

>>> import torch

>>> model = mn.Lambda(lambda x: x.mean())

>>> model(torch.arange(5, dtype=torch.float)).item()
2.0

>>> def subtract(x, y):
>>>     return x - y

>>> model = mn.Lambda(subtract)

>>> model(2 * torch.ones(1), torch.ones(1)).item()
1.0
class magnet.nodes.Conv(c=None, k=3, p='half', s=1, d=1, g=1, b=True, ic=None, act='relu', bn=False, **kwargs)[source]

Applies a convolution over an input tensor.

Parameters:
  • c (int) – Number of channels produced by the convolution. Default: Inferred
  • k (int or tuple) – Size of the convolving kernel. Default: 3
  • p (int, tuple or str) – Zero-padding added to both sides of the input. Default: 'half'
  • s (int or tuple) – Stride of the convolution. Default: 1
  • d (int or tuple) – Spacing between kernel elements. Default: 1
  • g (int) – Number of blocked connections from input channels to output channels. Default: 1
  • b (bool) – If True, adds a learnable bias to the output. Default: True
  • ic (int) – Number of channels in the input image. Default: Inferred
  • act (str or None) – The activation function to use. Default: 'relu'
  • p can be conveniently used for 'half', 'same' or 'double' padding to half, same or double the image size respectively. The arguments are accordingly inferred at runtime. For 'half' padding, the output channels (if not provided) are set to twice the input channels to make up for the lost information and vice-versa for the double padding. For 'same' padding, the output channels are kept equal to the input channels. In all three cases, the dilation is set to 1 and the stride is modified as required.
  • c is inferred from the second dimension of the input tensor.
  • act is set to 'relu' by default unlike the PyTorch implementation where activation functions need to be seperately defined. Take caution to manually set the activation to None, where needed.

Note

The dimensions (1, 2 or 3) of the convolutional kernels are inferred from the corresponding shape of the input tensor.

Note

One can also create multiple Nodes using the convinient multiplication (*) operation.

Multiplication with an integer \(n\), gives \(n\) copies of the Node.

Multiplication with a list or tuple of integers, \((c_1, c_2, ..., c_n)\) gives \(n\) copies of the Node with c set to \(c_i\)

Shape: - Input: \((N, C_{in}, *)\) where * is any non-zero number of trailing dimensions. - Output: \((N, C_{out}, *)\)

Variables:layer (nn.Module) – The Conv module built from torch.nn

Examples:

>>> import torch

>>> from torch import nn

>>> import magnet.nodes as mn
>>> from magnet.utils import summarize

>>> # A Conv layer with 32 channels and half padding
>>> model = mn.Conv(32)

>>> model(torch.randn(4, 16, 28, 28)).shape
torch.Size([4, 32, 14, 14])

>>> # Alternatively, the 32 in the constructor may be omitted
>>> # since it is inferred on runtime.

>>> # The same conv layer with 'double' padding
>>> model = mn.Conv(p='double')

>>> model(torch.randn(4, 16, 28, 28)).shape
torch.Size([4, 8, 56, 56])

>>> layers = mn.Conv() * 3
[Conv(), Conv(), Conv()]

>>> model = nn.Sequential(*layers)
>>> summarize(model)
+-------+------------+----------------------+
| Node  |   Shape    | Trainable Parameters |
+-------+------------+----------------------+
| input | 16, 28, 28 |          0           |
+-------+------------+----------------------+
| Conv  | 32, 14, 14 |        4,640         |
+-------+------------+----------------------+
| Conv  |  64, 7, 7  |        18,496        |
+-------+------------+----------------------+
| Conv  | 128, 4, 4  |        73,856        |
+-------+------------+----------------------+
Total Trainable Parameters: 96,992
class magnet.nodes.Linear(o=1, b=True, flat=True, i=None, act='relu', bn=False, **kwargs)[source]

Applies a linear transformation to the incoming tensor

Parameters:
  • o (int or tuple) – Output dimensions. Default: \(1\)
  • b (bool) – Whether to include a bias term. Default: True
  • flat (bool) – Whether to flatten out the input to 2 dimensions. Default: True
  • i (int) – Input dimensions. Default: Inferred
  • act (str or None) – The activation function to use. Default: 'relu'
  • bn (bool) – Whether to use Batch Normalization immediately after the layer. Default: False
  • flat is used by default to flatten the input to a vector. This is useful, say in the case of CNNs where an 3-D image based output with multiple channels needs to be fed to several dense layers.
  • o is inferred from the last dimension of the input tensor.
  • act is set to ‘relu’ by default unlike the PyTorch implementation where activation functions need to be seperately defined. Take caution to manually set the activation to None, where needed.

Note

One can also create multiple Nodes using the convinient multiplication (*) operation.

Multiplication with an integer \(n\), gives \(n\) copies of the Node.

Multiplication with a list or tuple of integers, \((o_1, o_2, ..., o_n)\) gives \(n\) copies of the Node with o set to \(o_i\)

Note

If o is a tuple, the output features are its product and the output is inflated to this shape.

Shape:
If flat is True
  • Input: \((N, *)\) where \(*\) means any number of trailing dimensions
  • Output: \((N, *)\)
Else
  • Input: \((N, *, in\_features)\) where \(*\) means any number of trailing dimensions
  • Output: \((N, *, out\_features)\) where all but the last dimension are the same shape as the input.
Variables:layer (nn.Module) – The Linear module built from torch.nn

Examples:

>>> import torch

>>> from torch import nn

>>> import magnet.nodes as mn
>>> from magnet.utils import summarize

>>> # A Linear mapping to 10-dimensional space
>>> model = mn.Linear(10)

>>> model(torch.randn(64, 3, 28, 28)).shape
torch.Size([64, 10])

>>> # Don't flatten the input
>>> model = mn.Linear(10, flat=False)

>>> model(torch.randn(64, 3, 28, 28)).shape
torch.Size([64, 3, 28, 10])

>>> # Make a Deep Neural Network
>>> # Don't forget to turn the activation to None in the final layer
>>> layers = mn.Linear() * (10, 50) + [mn.Linear(10, act=None)]
[Linear(), Linear(), Linear()]

>>> model = nn.Sequential(*layers)
>>> summarize(model)
+------+---------+--------------------+----------------------------------------------------+
| Node |  Shape  |Trainable Parameters|                   Arguments                        |
+------+---------+--------------------+----------------------------------------------------+
|input |3, 28, 28|         0          |                                                    |
+------+---------+--------------------+----------------------------------------------------+
|Linear|   10    |       23,530       |bn=False, act=relu, i=2352, flat=True, b=True, o=10 |
+------+---------+--------------------+----------------------------------------------------+
|Linear|   50    |        550         |bn=False, act=relu, i=10, flat=True, b=True, o=50   |
+------+---------+--------------------+----------------------------------------------------+
|Linear|   10    |        510         |bn=False, act=None, i=50, flat=True, b=True, o=10   |
+------+---------+--------------------+----------------------------------------------------+
Total Trainable Parameters: 24,590
class magnet.nodes.RNN(h, n=1, b=False, bi=False, act='tanh', d=0, batch_first=False, i=None, **kwargs)[source]

Applies a multi-layer RNN with to an input tensor.

Parameters:
  • h (int, Required) – The number of features in the hidden state h
  • n (int) – Number of layers. Default: 1
  • b (bool) – Whether to include a bias term. Default: True
  • bi (bool) – If True, becomes a bidirectional RNN. Default: False
  • act (str or None) – The activation function to use. Default: 'tanh'
  • d (int) – The dropout probability of the outputs of each layer. Default: 0
  • batch_first (False) – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False
  • i (int) – Input dimensions. Default: Inferred
  • i is inferred from the last dimension of the input tensor.

Note

One can also create multiple Nodes using the convinient multiplication (*) operation.

Multiplication with an integer \(n\), gives \(n\) copies of the Node.

Multiplication with a list or tuple of integers, \((h_1, h_2, ..., h_n)\) gives \(n\) copies of the Node with h set to \(h_i\)

Variables:layer (nn.Module) – The RNN module built from torch.nn

Examples:

>>> import torch

>>> from torch import nn

>>> import magnet.nodes as mn
>>> from magnet.utils import summarize

>>> # A recurrent layer with 32 hidden dimensions
>>> model = mn.RNN(32)

>>> model(torch.randn(7, 4, 300))[0].shape
torch.Size([7, 4, 32])

>>> # Attach a linear head
>>> model = nn.Sequential(model, mn.Linear(1000, act=None))
class magnet.nodes.LSTM(h, n=1, b=False, bi=False, d=0, batch_first=False, i=None, **kwargs)[source]

Applies a multi-layer LSTM with to an input tensor.

See mn.RNN for more details

class magnet.nodes.GRU(h, n=1, b=False, bi=False, d=0, batch_first=False, i=None, **kwargs)[source]

Applies a multi-layer GRU with to an input tensor.

See mn.RNN for more details

class magnet.nodes.BatchNorm(e=1e-05, m=0.1, a=True, track=True, i=None, **kwargs)[source]

Applies Batch Normalization to the input tensor e=1e-05, m=0.1, a=True, track=True, i=None

Parameters:
  • e (float) – A small value added to the denominator for numerical stability. Default: 1e-5
  • m (float or None) – The value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1
  • a (bool) – Whether to have learnable affine parameters. Default: True
  • track (bool) – Whether to track the running mean and variance. Default: True
  • i (int) – Input channels. Default: Inferred
  • i is inferred from the second dimension of the input tensor.

Note

The dimensions (1, 2 or 3) of the running mean and variance are inferred from the corresponding shape of the input tensor.

Note

One can also create multiple Nodes using the convinient multiplication (*) operation.

Multiplication with an integer \(n\), gives \(n\) copies of the Node.

Multiplication with a list or tuple of integers, \((i_1, i_2, ..., i_n)\) gives \(n\) copies of the Node with i set to \(i_i\)

Shape:
  • Input: \((N, C, *)\) where \(*\) means any number of trailing dimensions
  • Output: \((N, C, *)\) (same shape as input)
Variables:layer (nn.Module) – The BatchNorm module built from torch.nn

Examples:

>>> import torch

>>> from torch import nn

>>> import magnet.nodes as mn
>>> from magnet.utils import summarize

>>> # A Linear mapping to 10-dimensional space
>>> model = mn.Linear(10)

>>> model(torch.randn(64, 3, 28, 28)).shape
torch.Size([64, 10])

>>> # Don't flatten the input
>>> model = mn.Linear(10, flat=False)

>>> model(torch.randn(64, 3, 28, 28)).shape
torch.Size([64, 3, 28, 10])

>>> # Make a Deep Neural Network
>>> # Don't forget to turn the activation to None in the final layer
>>> layers = mn.Linear() * (10, 50) + [mn.Linear(10, act=None)]
[Linear(), Linear(), Linear()]

>>> model = nn.Sequential(*layers)
>>> summarize(model)
+------+---------+--------------------+----------------------------------------------------+
| Node |  Shape  |Trainable Parameters|                   Arguments                        |
+------+---------+--------------------+----------------------------------------------------+
|input |3, 28, 28|         0          |                                                    |
+------+---------+--------------------+----------------------------------------------------+
|Linear|   10    |       23,530       |bn=False, act=relu, i=2352, flat=True, b=True, o=10 |
+------+---------+--------------------+----------------------------------------------------+
|Linear|   50    |        550         |bn=False, act=relu, i=10, flat=True, b=True, o=50   |
+------+---------+--------------------+----------------------------------------------------+
|Linear|   10    |        510         |bn=False, act=None, i=50, flat=True, b=True, o=10   |
+------+---------+--------------------+----------------------------------------------------+
Total Trainable Parameters: 24,590

magnet.training

Trainer

class magnet.training.Trainer(models, optimizers)[source]

Abstract base class for training models.

The Trainer class makes it incredibly simple and convinient to train, monitor, debug and checkpoint entire Deep Learning projects.

Simply define your training loop by implementing the optimize() method.

Parameters:
  • models (list of nn.Module) – All the models that need to be trained
  • optimizers (list of optim.Optimizer) – Any optimizers that are used

Note

If any model is in eval() model, the trainer is set off. This means that as per protocol, all models will not train.

Variables:callbacks (list) – A list of callbacks attached to the trainer.

Take a look at SupervisedTrainer for an idea on how to extend this class.

optimize()[source]

Defines the core optimization loop. This method is called on each iteration.

Two quick protocols that one needs to follow are:

1. Do NOT actually backpropagate or step() the optimizers if the trainer is not training. Use the is_training() method to find out. This is essential since this will ensure that the trainer behaves as expected when is_training() is False. Useful, for example, in cases like callbacks.ColdStart

2. Send a callback the signal 'gradient' with a keyword argument 'models' that is the list of models that accumulate a gradient. Usually, it’s all the modules (self.modules).

Any callbacks that listen to this signal are interested in the gradient information (eg. callbacks.Babysitter).

train(dataloader, epochs=1, callbacks=None, **kwargs)[source]

Starts the training process.

Parameters:
  • dataloader (DataLoader) – The MagNet dataloader that iterates over the training set
  • epochs (float or int) – The number of epochs to train for. Default: 1
  • callbacks (list) – Any callbacks to be attached. Default: None
Keyword Arguments:
 

iterations (int) – The number of iterations to train for. Overrides epochs.

Note

PyTorch DataLoader s are not supported.

Ideally, encapsulate your dataset in the Data class.

mock(path=None)[source]

A context manager that creates a temporary ‘safe’ scope for training.

All impact to stateful objects (models, optimizers and the trainer itself) are forgotten once out of this scope.

This is very useful if you need to try out what-if experiments.

Parameters:path (pathlib.Path) – The path to save temporary states into Default: {System temp directory}/.mock_trainer
epochs(mode=None)[source]

The number of epochs completed.

Parameters:mode (str or None) – If the mode is 'start' or 'end', a boolean is returned signalling if it’s the start or end of an epoch
register_parameter(name, value)[source]

Use this to register ‘stateful’ parameters that are serialized

SupervisedTrainer

class magnet.training.SupervisedTrainer(model, optimizer='adam', loss='cross_entropy', metrics=None)[source]

A simple trainer that implements a supervised approach where a simple model \(\hat{y} = f(x)\) is trained to map \(\hat{y}\) to ground-truth \(y\) according to some specified loss.

This is the training routine that most high-level deep learning frameworks implement.

Parameters:
  • model (nn.Module) – The model that needs to be trained
  • optimizer (str or optim.Optimzer) – The optimizer used to train the model. Default: 'adam'
  • loss (str or callable) – A loss function that gives the objective to be minimized. Default: 'cross_entropy'
  • metrics (list) – Any other metrics that need to be monitored. Default: None
  • optimizer can be an actual optim.Optimizer instance or the name of a popular optimzizer (eg. 'adam').
  • loss can be a function or the name of a popular loss function (eg. 'cross_entropy'). It should accept 2 arguments (\(\hat{y}\), \(y\)).
  • metrics should contain a list of functions which accept 2 arguments (\(\hat{y}\), \(y\)), like the loss function.

Note

A static validate() function is provided for the validation callback

Note

The metrics is of no use unless there is some callback (eg.``callbacks.Monitor``) to receive the metrics

Examples:

>>> import magnet as mag
>>> import magnet.nodes as mn

>>> from magnet.data import Data
>>> from magnet.training import callbacks, SupervisedTrainer

>>> data = Data.get('mnist')

>>> model = mn.Linear(10, act=None)
>>> model.build(x=next(data())[0])

>>> trainer = SupervisedTrainer(model)
>>> callbacks=[callbacks.Monitor(),
               callbacks.Validate(data(64, mode='val'), SupervisedTrainer.validate)]
>>> trainer.train(data(64, shuffle=True), 1, callbacks)
magnet.training.finish_training(path, names=None)[source]

A helper function for cleaning up the training logs and other checkpoints and retaining only the state_dicts of the trained models.

Parameters:
  • path (pathlib.Path) – The path where the trainer was checkpointed
  • names (list) – The names of the models in the order given to the trainer. Default: None
  • names can be used if the models themselves did not have names prior to training. The checkpoints default to an ordered naming scheme. If passed, the files are additionally renamed to these names.

Note

Does nothing / fails silently if the path does not exist.

Example:

>>> # Assume that we've defined two models - encoder and decoder,
>>> # and a suitable trainer. The models do not have a 'name' attribute.

>>> trainer.save_state(checkpoint_path / 'my-trainer')

>>> # Suppose the checkpoint directory contains the following files:
>>> # my-trainer/
>>> #     models/
>>> #         0.pt
>>> #         1.pt
>>> #     callbacks/
>>> #         monitor/
>>> #         babysitter/
>>> #     state.p

>>> finish_training(path, names=['encoder', 'decoder'])

>>> # Now the directory contains these files:
>>> # encoder.pt
>>> # decoder.pt

magnet.training.callbacks

CallbackQueue

class magnet.training.callbacks.CallbackQueue[source]

A container for multiple callbacks that can be called in parallel.

If multiple callbacks need to be called together (as intended), they can be registered via this class.

Since callbacks need to be unique (by their name), this class also ensures that there are no duplicates.

__call__(signal, *args, **kwargs)[source]

Broadcasts a signal to all registered callbacks along with payload arguments.

Parameters:signal (object) – Any object that is broadcast as a signal.

Note

Any other arguments will be sent as-is to the callbacks.

find(name)[source]

Scans through the registered list and finds the callback with name.

If not found, returns None.

Raises:RuntimeError – If multiple callbacks are found.

Monitor

class magnet.training.callbacks.Monitor(frequency=10, show_progress=True, **kwargs)[source]

Allows easy monitoring of the training process.

Stores any metric / quantity broadcast using the 'write_stats' signal.

Also adds a nice progress bar!

Parameters:
  • frequency (int) – Then number of times per epoch to flush the buffer. Default: 10
  • show_progress (bool) – If True, adds a progress bar. Default: True
Keyword Arguments:
 

name (str) – Name of this callback. Default: 'monitor'

  • frequency is useful only if there are buffered metrics.

Examples:

>>> import torch

>>> import magnet as mag
>>> import magnet.nodes as mn

>>> from magnet.training import callbacks, SupervisedTrainer

>>> model = mn.Linear(10, act=None)
>>> with mag.eval(model): model(torch.randn(4, 1, 28, 28))

>>> trainer = SupervisedTrainer(model)

>>> callbacks = callbacks.CallbackQueue([callbacks.Monitor()])
>>> callbacks(signal='write_stats', trainer=trainer, key='loss', value=0.1)

>>> callbacks[0].history
{'loss': [{'val': 0.1}]}
__call__(trainer, signal, **kwargs)[source]

Responds to the following signals:

  • 'write_stats': Any keyword arguments will be passed to the History.append() method.
  • 'on_training_start': To be called before start of training. Initializes the progress bar.
  • 'on_batch_start': Called before the training loop. Updates the progress bar.
  • 'on_batch_end': Called after the training loop. Flushes the history buffer if needed and sets the progress bar description.
  • 'on_training_end': To be called after training. Closes the progress bar.
  • 'load_state': Loads the state of this callback from path.
  • 'save_state': Saves the state of this callback to path.
show(metric=None, log=False, x_key='epochs', **kwargs)[source]

Calls the corresponding History.show() method.

Validate

class magnet.training.callbacks.Validate(dataloader, validate, frequency=10, batches=None, drop_last=False, **kwargs)[source]

Runs a validation function over a dataset during the course of training.

Most Machine Learning research uses a held out validation set as a proxy for the test set / real-life data. Hyperparameters are usually tuned on the validation set.

Often, this is done during training in order to view the simultaneous learning on the validation set and catch any overfitting / underfitting.

This callback enables you to run a custom validate function over a dataloader.

Parameters:
  • dataloader (DataLoader) – DataLoader containing the validation set
  • validate (bool) – A callable that does the validation
  • frequency (int) – Then number of times per epoch to run the function. Default: \(10\)
  • batches (int or None) – The number of times / batches to call the validate function in each run. Default: None
  • drop_last (bool) – If True, the last batch is not run. Default: False
Keyword Arguments:
 

name (str) – Name of this callback. Default: 'validate'

  • validate is a function which takes two arguments: (trainer, dataloader).

  • batches defaults to a value which ensures that an epoch of the validation set matches an epoch of the training set.

    For instance, if the training set has \(80\) datapoints and the validation set has \(20\) and the batch size is \(1\) for both, an epoch consists of \(80\) iterations for the training set and \(20\) for the validation set.

    If the validate function is run \(10\) times(frequency) per epoch of the training set, then batches must be \(2\).

__call__(trainer, signal, **kwargs)[source]

Responds to the following signals:

  • 'on_training_start': To be called before start of training. Automatically finds the number of batches per run.
  • 'on_batch_end': Called after the training loop. Calls the validate function.
  • 'on_training_end': To be called after training. If drop_last, calls the validate function.
  • 'load_state': Loads the state of this callback from path.
  • 'save_state': Saves the state of this callback to path.

Checkpoint

class magnet.training.callbacks.Checkpoint(path, interval='5 m', **kwargs)[source]

Serializes stateful objects during the training process.

For many practical Deep Learning projects, training takes many hours, even days.

As such, it is only natural that you’d want to save the progress every once in a while.

This callback saves the models, optimizers, schedulers and the trainer itself periodically and automatically loads from those states if found.

Parameters:
  • path (pathlib.Path) – The root path to save to
  • interval (str) – The time between checkpoints. Default: ‘5 m’
Keyword Arguments:
 

name (str) – Name of this callback. Default: 'checkpoint'

  • interval should be a string of the form '{duration} {unit}'. Valid units are: 'us' (microseconds), 'ms' (milliseconds), 's' (seconds), 'm' (minutes)’, 'h' (hours), 'd' (days).
__call__(trainer, signal, **kwargs)[source]

Responds to the following signals:

  • 'on_training_start': To be called before start of training. Creates the path if it doesn’t exist and loads from it if it does. Also sets the starting time.
  • 'on_batch_end': Called after the training loop. Checkpoints if the interval is crossed and resets the clock.
  • 'on_training_end': To be called after training. Checkpoints one last time.
  • 'load_state': Loads the state of this callback from path.
  • 'save_state': Saves the state of this callback to path.

ColdStart

class magnet.training.callbacks.ColdStart(epochs=0.1, **kwargs)[source]

Starts the trainer in eval mode for a few iterations.

Sometimes, you may want to find out how the model performs prior to any training. This callback freezes the training initially.

Parameters:epochs (float) – The number of epochs to freeze the trainer. Default: \(0.1\)
Keyword Arguments:
 name (str) – Name of this callback. Default: 'coldstart'
__call__(trainer, signal, **kwargs)[source]

Responds to the following signals:

  • 'on_training_start': To be called before start of training. Sets the models in eval mode.
  • 'on_batch_end': Called after the training loop. If the epochs is exhausted, unfreezes the trainer and removes this callback from the queue.

LRScheduler

class magnet.training.callbacks.LRScheduler(scheduler, **kwargs)[source]

A helper callback to add in optimizer schedulers.

Parameters:scheduler (LRScheduler) – The scheduler.
Keyword Arguments:
 name (str) – Name of this callback. Default: 'lr_scheduler'
__call__(trainer, signal, **kwargs)[source]

Responds to the following signals:

  • 'on_batch_start': Called before the training loop. If it is the start of an epoch, steps the scheduler.

magnet.training.history

class magnet.training.history.History[source]

A dictionary-like repository which is used to store several metrics of interest in training in the form of snapshots.

This object can be utilized to collect, store and analyze training metrics against a variety of features of interest (epochs, iterations, time etc.)

Since this is a subclass of dict, it can be used as such. However, it is prefered to operate it using the class-specific methods.

Examples:

>>> history = History()

>>> # Add a simple value with a time stamp.
>>> # This is like the statement: history['loss'] = 69
>>> # However, any additional stamps can also be attached.
>>> history.append('loss', 69, time=time())
{'loss': [{'val': 69, 'time': 1535095251.6717412}]}

>>> history.clear()

>>> # Use a small buffer-size of 10.
>>> # This means that only the latest 10 values are kept.
>>> for i in range(100): history.append('loss', i, buffer_size=10)

>>> # Flush the buffer with a time stamp.
>>> history.flush(time=time())

>>> # The mean of the last 10 values is now stored.
{'loss': [{'val': 94.5, 'time': 1535095320.9745226}]}
find(key)[source]

A helper method that returns a filtered dictionary with a search key.

Parameters:key (str) – The filter key

Examples:

>>> # Assume the history is empty with keys: ['loss', 'val_loss',
>>> # 'encoder_loss', 'accuracy', 'wierd-metric']

>>> history.find('loss')
{'loss': [], 'val_loss': [], 'encoder_loss': []}
append(key, value, validation=False, buffer_size=None, **stamps)[source]

Append a new snapshot to the history.

Parameters:
  • key (str) – The dictionary key / name of the object
  • value (object) – The actual object
  • valdiation (bool) – Whether this is a validation metric. Default: False
  • buffer_size (int or None) – The size of the buffer of the key. Default: None
  • validation is just a convinient key-modifier. It appends 'val_' to the key.

  • buffer_size defines the size of the storage buffer for the specific key.

    The latest buffer_size snapshots are stored.

    If None, the key is stored as is.

Note

Any further keyword arguments define stamps that are essentially the signatures for the snapshot.

show(key=None, log=False, x_key=None, validation=True, legend=None, **kwargs)[source]

Plot the snapshots for a key against a stamp.

Parameters:
  • key (str) – The key of the record
  • log (bool) – If True, the y-axis will be log-scaled. Default: False
  • x_key (str or None) – The stamp to use as the x-axis. Default: None
  • validation (bool) – Whether to plot the validation records (if they exist) as well. Default: True
  • legend (str or None) – The legend entry. Default: None
Keyword Arguments:
 
  • ax (pyplot axes object) – The axis to plot into. Default: None
  • smoothen (bool) – If True, smoothens the plot. Default: True
  • window_fraction (float) – How much of the plot to use as a window for smoothing. Default: \(0.3\)
  • gain (float) – How much more dense to make the plot. Default: \(10\)
  • replace_outliers (bool) – If True, replaces outlier datapoints by a sensible value. Default: True
  • key can be None, in which case this method is successively called for all existing keys. The log attribute is overriden, however. It is only set to True for any key with 'loss' in it.
  • legend can be None, in which case the default legends 'training' and 'validation' are applied respectively.
flush(key=None, **stamps)[source]

Flush the buffer (if exists) and append the mean.

Parameters:key (str or None) – The key to flush. Default: None
  • key can be None, in which case this method is successively called for all existing keys.

Note

Any further keyword arguments define stamps that are essentially the signatures for the snapshot.

class magnet.training.history.SnapShot(buffer_size=-1)[source]

A list of stamped values (snapshots).

This is used by the History object to store a repository of training metrics.

Parameters:buffer_size (int) – The size of the buffer. Default: \(-1\)
  • If buffer_size is negative, then the snapshots are stored as is.
append(value, buffer=False, **stamps)[source]

Add a new snapshot.

Parameters:
  • value (object) – The value to add
  • buffer (bool) – If True, adds to the buffer instead. Default: False

Note

Any further keyword arguments define stamps that are essentially the signatures for the snapshot.

flush(**stamps)[source]

Flush the buffer (if exists) and append the mean.

Note

Any keyword arguments define stamps that are essentially the signatures for the snapshot.

show(ax, x=None, label=None, **kwargs)[source]

Plot the snapshots against a stamp.

Parameters:
  • ax (pyplot axes object) – The axis to plot into
  • x (str or None) – The stamp to use as the x-axis. Default: None
  • label (str or None) – The label for the line. Default: None
  • key can be None, in which case this method is successively called for all existing keys. The log attribute is overriden, however. It is only set to True for any key with 'loss' in it.
  • legend can be None, in which case the default legends 'training' and 'validation' are applied respectively.
Keyword Arguments:
 () – See History.show() for more details.

Note

Any further keyword arguments are passed to the plot function.

magnet.training.utils

magnet.training.utils.load_object(path, **kwargs)[source]

A convinience method to unpickle a file.

Parameters:path (pathlib.Path) – The path to the pickle file
Keyword Arguments:
 default (object) – A default value to be returned if the file does not exist. Default: None
Raises:RuntimeError – If a default keyword argument is not provided and the file is not found.
magnet.training.utils.load_state(module, path, alternative_name=None)[source]

Loads the state_dict of a PyTorch object from a specified path.

This is a more robust version of the of the PyTorch way in the sense that the device mapping is automatically handled.

Parameters:
  • module (object) – Any PyTorch object that has a state_dict
  • path (pathlib.Path) – The path to folder containing the state_dict file
  • alternative_name (str or None) – A fallback name for the file if the module object does not have a name attribute. Default: None
Raises:

RuntimeError – If no alternative_name is provided and the module does not have a name.

Note

If you already know the file name, set alternative_name to that.

This is just a convinience method that assumes that the file name will be the same as the name of the module (if there is one).

magnet.training.utils.save_object(obj, path)[source]

A convinience method to pickle an object.

Parameters:

Note

If the path does not exists, it is created.

magnet.training.utils.save_state(module, path, alternative_name=None)[source]

Saves the state_dict of a PyTorch object to a specified path.

Parameters:
  • module (object) – Any PyTorch object that has a state_dict
  • path (pathlib.Path) – The path to a folder to save the state_dict to
  • alternative_name (str or None) – A fallback name for the file if the module object does not have a name attribute. Default: None
Raises:

RuntimeError – If no alternative_name is provided and the module does not have a name.

Debugging

magnet.debug.overfit(trainer, data, batch_size, epochs=1, metric='loss', **kwargs)[source]

Runs training on small samples of the dataset in order to overfit.

If you can’t overfit a small sample, you can’t model the data well.

This debugger tries to overfit on multple small samples of the data. The sample size and batch sizes are varied and the training is done for a fixed number of epochs.

This usually gives an insight on what to expect from the actual training.

Parameters:
  • trainer (magnet.training.Trainer) – The Trainer object
  • data (magnet.data.Data) – The data object used for training
  • batch_size (int) – The intended batch size
  • epochs (float) – The expected epochs for convergence for 1% of the data. Default: 1
  • metric (str) – The metric to plot. Default: 'loss'

Note

The maximum sample size is 1% of the size of the dataset.

Examples:

>>> import magnet as mag
>>> import magnet.nodes as mn
>>> import magnet.debug as mdb

>>> from magnet.data import Data
>>> from magnet.training import SupervisedTrainer

>>> data = Data.get('mnist')

>>> model = mn.Linear(10)
>>> with mag.eval(model): model(next(data())[0])

>>> trainer = SupervisedTrainer(model)

>>> mdb.overfit(trainer, data, batch_size=64)
_images/overfit-fail.png
>>> # Oops! Looks like there was something wrong.
>>> # Loss does not considerable decrease for samples sizes >= 4.
>>> # Of course, the activation was 'relu'.
>>> model = mn.Linear(10, act=None)
>>> with mag.eval(model): model(next(data())[0])

>>> trainer = SupervisedTrainer(model)

>>> mdb.overfit(trainer, data, batch_size=64)
>>> # Should be much better now.
_images/overfit-pass.png
magnet.debug.check_flow(trainer, data)[source]

Checks if any trainable parameter is not receiving gradients.

Super useful for large architectures that use the detach() function

Parameters:
  • trainer (magnet.trainer.Trainer) – The Trainer object
  • data (magnet.data.Data) – The data object used for training
class magnet.debug.Babysitter(frequency=10, **kwargs)[source]

A callback which monitors the mean relative gradients for all parameters.

Parameters:frequency (int) – Then number of times per epoch to monitor. Default: \(10\)
Keyword Arguments:
 name (str) – Name of this callback. Default: 'babysitter'
magnet.debug.shape(debug=True)[source]

The shapes of every tensor is printed out if a module is called within this context manager.

Useful for debugging the flow of tensors through layers and finding the values of various hyperparameters.

Parameters:debug (bool or str) – If str, only the tensor with this name is tracked. If True, all tensors are tracked. Else, nothing is tracked.

magnet.utils

magnet.utils.summarize(module, x, parameters='trainable', arguments=False, batch=False, max_width=120)[source]

Prints a pretty picture of how a one-input one output sequential model works.

Similar to Model.summarize found in Keras.

Parameters:
  • module (nn.Module) – The module to summarize
  • x (torch.Tensor) – A sample tensor sent as input to the module.
  • parameters (str or True) – Which kind of parameters to enumerate. Default: 'trainable'
  • arguments (bool) – Whether to show the arguments to a node. Default: False
  • batch (bool) – Whether to show the batch dimension in the shape. Default: False
  • max_width (int) – The maximum width of the table. Default: 120
  • parameters is one of ['trainable', 'non-trainable', 'all', True].

    ‘trainable’ parameters are the ones which require gradients and can be optimized by SGD.

    Setting this to True will print both types as a tuple.

magnet.utils.images

magnet.utils.images.show_images(images, **kwargs)[source]

A nifty helper function to show images represented by tensors. :param images: The images

to show
  • images can be anything which from you could conceivable harvest an image. If it’s a torch.Tensor, it is converted to a numpy.ndarray. The first dimension of the tensor is treated as a batch dimension. If it’s a str, it is treated as a glob path from which all images are extracted. More commonly, a list of numpy arrays can be given.
Keyword Arguments:
 
  • pixel_range (tuple or 'auto') – The range of pixel values to be expected. Default: 'auto'
  • cmap (str or None) – The color map for the plots. Default: 'gray'
  • merge (bool) – If True, all images are merged into one giant image. Default: True
  • titles (list or None) – The titles for each image. Default: None
  • shape (str) – The shape of the merge tile. Default: 'square'
  • resize (str) – The common shape to which images are resized. Default: 'smean'
  • retain (bool) – If True, the plot is retained. Default: False
  • savepath (str or None) – If given, the image is saved to this path. Default: None
  • pixel_range default to the range in the image.
  • titles should only be given if merge is True.

Note

The merge shape is controlled by shape which can be either 'square', 'row', 'column' or a tuple which explicitly specifies this shape. 'square' automatically finds a shape with least difference between the number of rows and columns. This is aesthetically pleasing. In the explicit case, the product of the tuple needs to equal the number of images.

magnet.utils.plot

magnet.utils.plot.smooth_plot(*args, **kwargs)[source]

Same as the plot function from matplotlib… only smoother!

This function plots a modified, smoothened version of the data. Useful when data is jagged and one is interested in the average trends.

Keyword Arguments:
 
  • window_fraction (float) – The fraction of the data to use as window to the smoothener. Default: \(0.3\)
  • gain (float) – The amount of artificial datapoints inserted per raw datapoint. Default: \(10\)
  • replace_outliers (bool) – If True, replaces outlier datapoints by a sensible value. Default: True
  • ax (Pyplot axes object) – The axis to plot onto. Default: None

Note

Uses a Savitzky Golay filter to smoothen out the data.

magnet.utils.varseq

magnet.utils.varseq.pack(sequences, lengths=None)[source]

Packs a list of variable length Tensors

Parameters:
  • sequences (list or torch.Tensor) – The list of Tensors to pack
  • lengths (list) – list of lengths of each tensor. Default: None

Note

If sequences is a tensor, lengths needs to be provided.

Note

The packed sequence that is returned has a convinient unpack() method as well as shape and order attributes. The order attribute stores the sorting order which should be used for unpacking.

Shapes:
sequences should be a list of Tensors of size L x *, where L is the length of a sequence and * is any number of trailing dimensions, including zero.
magnet.utils.varseq.unpack(sequence, as_list=False)[source]

Unpacks a PackedSequence object.

Parameters:
  • sequence (PackedSequence) – The tensor to unpack.
  • as_list (bool) – If True, returns a list of tensors. Default: False

Note

The sequence should have an order attribute that stores the sorting order.

magnet.utils.varseq.sort(sequences, order, dim=0)[source]

Sorts a tensor in a certain order along a certain dimension.

Parameters:
  • sequences (torch.Tensor) – The tensor to sort
  • order (numpy.ndarray) – The sorting order
  • dim (int) – The dimension to sort. Default 0
magnet.utils.varseq.unsort(sequences, order, dim=0)[source]

Unsorts a tensor in a certain order along a certain dimension.

Parameters:
  • sequences (torch.Tensor) – The tensor to unsort
  • order (numpy.ndarray) – The sorting order
  • dim (int) – The dimension to unsort. Default 0

Indices and tables