Welcome to jdit documentation!¶
Quick Start¶
You can get a quick start by following these setps.
After building and installing jdit package, you can make a new directory for a quick test.
Assuming that you get a new directory example
.
run this code in ipython
.(Create a main.py
file is also acceptable.)
Fashion-mnist Classification¶
To start a simple classification task.
from jdit.trainer.instances.fashionClassification import start_fashionClassTrainer
start_fashionClassTrainer()
Then you will see something like this as following.
===> Build dataset
use 8 thread!
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Processing...
Done!
===> Building model
SimpleModel Total number of parameters: 2776522
ResNet model use CPU!
apply kaiming weight init!
===> Building optimizer
===> Training
using `tensorboard --logdir=log` to see learning curves and net structure.
training and valid_epoch data, configures info and checkpoint were save in `log` directory.
0%| | 0/10 [00:00<?, ?epoch/s]
0step [00:00, ?step/s]
- It will search a fashion mnist dataset.
- Then build a simple network for classification.
- For training process, you can find learning curves in
tensorboard
. - It will create a
log
directory inexample/
, which saves training processing data and configures.
Fashion-mnist Generation GAN¶
To start a simple generation gan task.
from jdit.trainer.instances import start_fashionGenerateGanTrainer
start_fashionClassTrainer()
Then you will see something like this as following.
===> Build dataset
use 2 thread!
===> Building model
Discriminator Total number of parameters: 100865
Discriminator model use GPU(0)!
apply kaiming weight init!
Generator Total number of parameters: 951361
Generator model use GPU(0)!
apply kaiming weight init!
===> Building optimizer
===> Training
0%| | 0/200 [00:00<?, ?epoch/s]
0step [00:00, ?step/s]
You can get the training processes info from tensorboard and log directory. It contains:
- Learning curves
- Input and output visualization
- The configures of
Model
,Trainer
,Optimizer
,Dataset
andPerformance
in.csv
. - Model checkpoint
Let’s build your own task¶
Although it is just an example, you still can build your own project easily by using jdit framework. Jdit framework can deal with
- Data visualization. (learning curves, images in pilot process)
- CPU, GPU or GPUs. (Training your model on specify devices)
- Intermediate data storage. (Saving training data into a csv file)
- Model checkpoint automatically.
- Flexible templates can be used to integrate and custom overrides.
So, Let’s build your own task by using jdit.
Build your own trainer¶
To build your own trainer, you need prepare these sections:
dataset
This is the datasets which you want to use.Model
This is a wrapper of your own pytorchmodule
.Optimizer
This is a wrapper of pytorchopt
.trainer
This is a training pipeline which assemble the sections above.
jdit.dataset¶
In this section, you should build your own dataset that you want to use following.
Common dataset¶
For some reasons, many opening dataset are common. So, you can easily build a standard common dataaset. such as :
- Fashion mnist
- Cifar10
- Lsun
Only one parameters you need to set is batch_shize
.
For these common datasets, you only need to reset the batch size.
>>> from jdit.dataset import FashionMNIST
>>> fashion_data = FashionMNIST(batch_shize=64) # now you get a ``dataset``
Custom dataset¶
If you want to build a dataset by your own data, you need to inherit the class
jdit.dataset.Dataloaders_factory
and rewrite it’s build_transforms()
and build_datasets()
(If you want to use default set, rewrite this is not necessary.)
Following these setps:
- Rewrite your own transforms to
self.train_transform_list
andself.valid_transform_list
. (Not necessary) - Register your training dataset to
self.dataset_train
by usingself.train_transform_list
- Register your valid_epoch dataset to
self.dataset_valid
by usingself.valid_transform_list
Example:
class FashionMNIST(DataLoadersFactory):
def __init__(self, root=r'.\datasets\fashion_data', batch_size=128, num_workers=-1):
super(FashionMNIST, self).__init__(root, batch_size, num_workers)
def build_transforms(self, resize=32):
# This is a default set, you can rewrite it.
self.train_transform_list = self.valid_transform_list = [
transforms.Resize(resize),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
def build_datasets(self):
self.dataset_train = datasets.CIFAR10(root, train=True, download=True,
transform=transforms.Compose(self.train_transform_list))
self.dataset_valid = datasets.CIFAR10(root, train=False, download=True,
transform=transforms.Compose(self.valid_transform_list))
For now, you get your own dataset.
Model¶
In this section, you should build your own network.
First, you need to build a pytorch module
like this:
>>> class SimpleModel(nn.Module):
... def __init__(self):
... super(SimpleModel, self).__init__()
... self.layer1 = nn.Linear(32, 64)
... self.layer2 = nn.Linear(64, 1)
...
... def forward(self, input):
... out = self.layer1(input)
... out = self.layer2(out)
... return out
>>> network = SimpleModel()
Note
You don’t need to convert it to gpu or using data parallel.
The jdit.Model
will do this for you.
Second, wrap your model by using jdit.Model
.
Set which gpus you want to use and the weights init method.
Note
For some reasons, the gpu id in pytorch still start from 0.
For this model, it will handel this problem.
If you have gpu [0,1,2,3]
, and you only want to use 2,3.
Just set gpu_ids_abs=[2, 3]
.
>>> from jdit import Model
>>> network = SimpleModel()
>>> jdit_model = Model(network, gpu_ids_abs=[2,3], init_method="kaiming")
SimpleModel Total number of parameters: 2177
SimpleModel dataParallel use GPUs[2, 3]!
apply kaiming weight init!
For now, you get your own dataset.
Optimizer¶
In this section, you should build your an optimizer.
Compare with the optimizer in pytorch. This extend a easy function that can do a learning rate decay and reset.
However, do_lr_decay()
will be called every epoch or on certain epoch
at the end automatically.
Actually, you don’ need to do anything to apply learning rate decay.
If you don’t want to decay. Just set lr_decay = 1.
or set a decay epoch larger than training epoch.
I will show you how it works and you can implement something special strategies.
>>> from jdit import Optimizer
>>> from torch.nn import Linear
>>> network = Linear(10, 1)
>>> #set params
>>> #`optimizer` is equal to pytorch class name (torch.optim.RMSprop).
>>> hparams = {
... "optimizer" = "RMSprop" ,
... "lr" = 0.001,
... "lr_decay" = 0.5,
... "weight_decay" = 2e-5,
... "momentum" = 0}
>>> #define optimizer
>>> opt = Optimizer(network.parameters(),**hparams)
>>> opt.lr
0.001
>>> opt.do_lr_decay()
>>> opt.lr
0.0005
>>> opt.do_lr_decay(reset_lr = 1)
>>> opt.lr
1
You can pass a certain name to use it,such "Adam" ,"RMSprop", "SGD".
Note
As for spectrum normalization, the optimizer will filter out the differentiable weights.
So, you don’t need write something like this
filter(lambda p: p.requires_grad, params)
Merely pass the model.parameters()
is enough.
For now, you get an Optimizer.
trainer¶
For the final section it is a little complex.
It supplies some templates such as SupTrainer
GanTrainer
ClassificationTrainer
and instances
.
The inherit relation shape is following:
SupTrainer
ClassificationTrainer
instances.FashionClassTrainer
SupGanTrainer
Pix2pixGanTrainer
instances.CifarPix2pixGanTrainer
GenerateGanTrainer
instances.FashionGenerateGanTrainer
Top level SupTrainer
¶
SupTrainer
is the top class of these templates.
It defines some tools to record the log, data visualization and so on. Besides, it contain a big loop of epoch, which can be inherited by the second level templates to fill the contents in each opch training.
Something like this:
def train():
for epoch in range(nepochs):
self._record_configs() # record info
self.train_epoch()
self.valid_epoch()
# do learning rate decay
self._change_lr()
# save model check point
self._check_point()
self.test()
Every method will be rewrite by the second level templates. It only defines a rough framework.
Second level ClassificationTrainer
¶
On this level, the task becomes more clear, a classification task.
We get one model
, one optimizer
and one dataset
and the data structure is images and labels.
So, to init a ClassificationTrainer.
class ClassificationTrainer(SupTrainer):
def __init__(self, logdir, nepochs, gpu_ids, net, opt, datasets, num_class):
super(ClassificationTrainer, self).__init__(nepochs, logdir, gpu_ids_abs)
self.net = net
self.opt = opt
self.datasets = datasets
self.num_class = num_class
self.labels = None
self.output = None
For the next, build a training loop for one epoch.
You must using self.step
to record the training step.
def train_epoch(self, subbar_disable=False):
# display training images every epoch
self._watch_images(show_imgs_num=3, tag="Train")
for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1), unit="step", disable=subbar_disable):
self.step += 1 # necessary!
# unzip data from one batch and move to certain device
self.input, self.ground_truth, self.labels = self.get_data_from_batch(batch, self.device)
self.output = self.net(self.input)
# this is defined in SupTrainer.
# using `self.compute_loss` and `self.opt` to do a backward.
self._train_iteration(self.opt, self.compute_loss, tag="Train")
@abstractmethod
def compute_loss(self):
"""Compute the main loss and observed variables.
Rewrite by the next templates.
"""
@abstractmethod
def compute_valid(self):
"""Compute the valid_epoch variables for visualization.
Rewrite by the next templates.
"""
The compute_loss()
and compute_valid
should be rewrite in the next template.
Third level FashionClassTrainer
¶
Up to this level every this is clear. So, inherit the ClassificationTrainer
and fill the specify methods.
class FashionClassTrainer(ClassificationTrainer):
def __init__(self, logdir, nepochs, gpu_ids, net, opt, dataset):
super(FashionClassTrainer, self).__init__(logdir, nepochs, gpu_ids, net, opt, dataset)
data, label = self.datasets.samples_train
# show dataset in tensorboard
self.watcher.embedding(data, data, label, 1)
def compute_loss(self):
var_dic = {}
var_dic["CEP"] = loss = nn.CrossEntropyLoss()(self.output, self.labels.squeeze().long())
return loss, var_dic
def compute_valid(self):
var_dic = {}
var_dic["CEP"] = cep = nn.CrossEntropyLoss()(self.output, self.labels.squeeze().long())
_, predict = torch.max(self.output.detach(), 1) # 0100=>1 0010=>2
total = predict.size(0) * 1.0
labels = self.labels.squeeze().long()
correct = predict.eq(labels).cpu().sum().float()
acc = correct / total
var_dic["ACC"] = acc
return var_dic
compute_loss()
will be called every training step of backward. It returns two values.
- The first one,
loss
, is main loss which will be implementedloss.backward()
to update model weights. - The second one,
var_dic
, is a value dictionary which will be visualized on tensorboard and depicted as a curve.
In this example, for compute_loss()
it will use loss = nn.CrossEntropyLoss()
to do a backward propagation and visualize it on tensorboard named "CEP"
.
compute_loss()
will be called every validation step. It returns one value.
- The
var_dic
, is the same thing likevar_dic
incompute_loss()
.
Note
compute_loss()
will be called under torch.no_grad()
.
So, grads will not be computed in this method. But if you need to get grads,
please use torch.enable_grad()
to make grads computation available.
Finally, you get a trainer.
You have got everything. Put them together and train it!
>>> mnist = FashionMNIST(batch_size)
>>> net = Model(SimpleModel(depth=depth), gpu_ids_abs=gpus, init_method="kaiming")
>>> opt = Optimizer(net.parameters(), **hparams)
>>> Trainer = FashionClassTrainer("log", nepochs, gpus, net, opt, mnist, 10)
>>> Trainer.train()
jdit.dataset¶
Dataloaders_factory¶
-
class
jdit.dataset.
DataLoadersFactory
(root: str, batch_size: int, num_workers=-1, shuffle=True, subdata_size=1)[source]¶ This is a super class of dataloader.
It defines same basic attributes and methods.
- For training data:
train_dataset
,loader_train
,nsteps_train
. Others such asvalid_epoch
andtest
have the same naming format. - For transform, you can define your own transforms.
- If you don’t have test set, it will be replaced by valid_epoch dataset.
It will build dataset following these setps:
build_transforms()
To build transforms for training dataset and valid_epoch. You can rewrite this method for your own transform. It will be used inbuild_datasets()
build_datasets()
You must rewrite this method to load your own dataset by passing datasets toself.dataset_train
andself.dataset_valid
.self.dataset_test
is optional. If you don’t pass a test dataset, it will be replaced byself.dataset_valid
.Example:
def build_transforms(self, resize=32): self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])] # Inherit this class and write this method. def build_datasets(self): self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
build_loaders()
It will use dataset, and passed parameters to build dataloaders forself.loader_train
,self.loader_valid
andself.loader_test
.
root
is the root path of datasets.batch_shape
is the size of data loader. shape is(Batchsize, Channel, Height, Width)
num_workers
is the number of threads, using to load data. If you pass -1, it will use the max number of threads, according to your cpu. Default: -1shuffle
is whether shuffle the data. Default:True
-
build_datasets
()[source]¶ You must to rewrite this method to load your own datasets.
self.dataset_train
. Assign a trainingdataset
to this.self.dataset_valid
. Assign a valid_epochdataset
to this.self.dataset_test
is optional. Assign a testdataset
to this. If not, it will be replaced byself.dataset_valid
.
Example:
self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
-
build_loaders
()[source]¶ Build datasets The previous function
self.build_datasets()
has created datasets. Use these datasets to build their’s dataloaders
-
build_transforms
(resize: int = 32)[source]¶ This will build transforms for training and valid_epoch.
You can rewrite this method to build your own transforms. Don’t forget to register your transforms to
self.train_transform_list
andself.valid_transform_list
The following is the default set.
self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
- For training data:
HandMNIST¶
-
class
jdit.dataset.
HandMNIST
(root='datasets/hand_data', batch_size=64, num_workers=-1)[source]¶ Hand writing mnist dataset.
Example:
>>> data = HandMNIST(r"../datasets/mnist") use 8 thread! Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz Processing... Done! >>> data.dataset_train Dataset MNIST Number of datapoints: 60000 Split: train Root Location: data Transforms (if any): Compose( Resize(size=32, interpolation=PIL.Image.BILINEAR) ToTensor() Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) ) Target Transforms (if any): None >>> # We don't set test dataset, so they are the same. >>> data.dataset_valid is data.dataset_test True >>> # Number of steps at batch size 128. >>> data.nsteps_train 469 >>> # Total samples of training datset. >>> len(data.dataset_train) 60000 >>> # The batch size of sample load is 1. So, we get length of loader is equal to samples amount. >>> len(data.samples_train) 6000
-
build_transforms
(resize: int = 32)[source]¶ This will build transforms for training and valid_epoch.
You can rewrite this method to build your own transforms. Don’t forget to register your transforms to
self.train_transform_list
andself.valid_transform_list
The following is the default set.
self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
-
FashionMNIST¶
-
class
jdit.dataset.
FashionMNIST
(root='datasets/fashion_data', batch_size=64, num_workers=-1)[source]¶ -
build_datasets
()[source]¶ You must to rewrite this method to load your own datasets.
self.dataset_train
. Assign a trainingdataset
to this.self.dataset_valid
. Assign a valid_epochdataset
to this.self.dataset_test
is optional. Assign a testdataset
to this. If not, it will be replaced byself.dataset_valid
.
Example:
self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
-
build_transforms
(resize: int = 32)[source]¶ This will build transforms for training and valid_epoch.
You can rewrite this method to build your own transforms. Don’t forget to register your transforms to
self.train_transform_list
andself.valid_transform_list
The following is the default set.
self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
-
Cifar10¶
-
class
jdit.dataset.
Cifar10
(root='datasets/cifar10', batch_size=32, num_workers=-1)[source]¶ -
build_datasets
()[source]¶ You must to rewrite this method to load your own datasets.
self.dataset_train
. Assign a trainingdataset
to this.self.dataset_valid
. Assign a valid_epochdataset
to this.self.dataset_test
is optional. Assign a testdataset
to this. If not, it will be replaced byself.dataset_valid
.
Example:
self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
-
Lsun¶
-
class
jdit.dataset.
Lsun
(root, batch_size=32, num_workers=-1)[source]¶ -
build_datasets
()[source]¶ You must to rewrite this method to load your own datasets.
self.dataset_train
. Assign a trainingdataset
to this.self.dataset_valid
. Assign a valid_epochdataset
to this.self.dataset_test
is optional. Assign a testdataset
to this. If not, it will be replaced byself.dataset_valid
.
Example:
self.dataset_train = datasets.CIFAR10(root, train=True, download=True, transform=transforms.Compose(self.train_transform_list)) self.dataset_valid = datasets.CIFAR10(root, train=False, download=True, transform=transforms.Compose(self.valid_transform_list))
-
build_transforms
(resize: int = 32)[source]¶ This will build transforms for training and valid_epoch.
You can rewrite this method to build your own transforms. Don’t forget to register your transforms to
self.train_transform_list
andself.valid_transform_list
The following is the default set.
self.train_transform_list = self.valid_transform_list = [ transforms.Resize(resize), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]
-
get_mnist_dataloaders¶
get_fashion_mnist_dataloaders¶
jdit.model¶
Model¶
-
class
jdit.
Model
(proto_model: <Mock name='mock.Module' id='140586685951168'>, gpu_ids_abs: Union[list, tuple] = (), init_method: Union[str, function, None] = 'kaiming', show_structure=False, check_point_pos=None, verbose=True)[source]¶ A warapper of pytorch
module
.In the simplest case, we use a raw pytorch
module
to assemble aModel
of this class. It can be more convenient to use some feather method, such_check_point
,load_weights
and so on.proto_model
is the core model in this class. It is no necessary to passing amodule
when you init aModel
. You can build a model later by usingModel.define(module)
or load a model from a file.gpu_ids_abs
controls the gpus which you want to use. you should use a absolute id of gpus.init_method
controls the weights init method.- At init_method=”xavier”, it will use
init.xavier_normal_
, inpytorch.nn.init
, to init the Conv layers of model. - At init_method=”kaiming”, it will use
init.kaiming_normal_
, inpytorch.nn.init
, to init the Conv layers of model. - At init_method=your_own_method, it will be used on weights,
just like what
pytorch.nn.init
method does.
- At init_method=”xavier”, it will use
show_structure
controls whether to show your network structure.
Note
Don’t try to pass a
DataParallel
model. Onlymodule
is accessible. It will change toDataParallel
class automatically by passing a muti-gpus ids, like[0, 1]
.Note
gpu_ids_abs
must be a tuple or list. If you want to use cpu, just passing an ampty list like[]
.- Args:
proto_model (module): A pytroch module. Default:
None
gpu_ids_abs (tuple or list): The absolute id of gpus. if [] using cpu. Default:
()
init_method (str or def): Weights init method. Default:
"Kaiming"
show_structure (bool): Is the structure shown. Default:
False
- Attributes:
num_params (int): The totals amount of weights in this model.
gpu_ids_abs (list or tuple): Which device is this model on.
Examples:
>>> from torch.nn import Sequential, Conv3d >>> # using a square kernels and equal stride >>> module = Sequential(Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0))) >>> # using cpu to init a Model by module. >>> net = Model(module, [], show_structure=False) Sequential Total number of parameters: 15873 Sequential model use CPU! apply kaiming weight init! >>> input_tensor = torch.randn(20, 16, 10, 50, 100) >>> output = net(input_tensor)
-
convert_to_distributed
(device_ids=None, output_device=None, dim=0, broadcast_buffers=True, process_group=None, bucket_cap_mb=25, find_unused_parameters=False, check_reduction=False)[source]¶ Args: module (Module): module to be parallelized device_ids (list of int or torch.device): CUDA devices. This should
only be provided when the input module resides on a single CUDA device. For single-device modules, thei``th :attr:`module` replica is placed on ``device_ids[i]
. For multi-device modules and CPU modules, device_ids must be None or an empty list, and input data for the forward pass must be placed on the correct device. (default: all devices for single-device modules)- output_device (int or torch.device): device location of output for
- single-device CUDA modules. For multi-device modules and CPU modules, it must be None, and the module itself dictates the output location. (default: device_ids[0] for single-device modules)
- broadcast_buffers (bool): flag that enables syncing (broadcasting) buffers of
- the module at beginning of the forward function.
(default:
True
) - process_group: the process group to be used for distributed data
- all-reduction. If
None
, the default process group, which is created by`torch.distributed.init_process_group`
, will be used. (default:None
) - bucket_cap_mb: DistributedDataParallel will bucket parameters into
- multiple buckets so that gradient reduction of each
bucket can potentially overlap with backward computation.
bucket_cap_mb
controls the bucket size in MegaBytes (MB) (default: 25) - find_unused_parameters (bool): Traverse the autograd graph of all tensors
- contained in the return value of the wrapped
module’s
forward
function. Parameters that don’t receive gradients as part of this graph are preemptively marked as being ready to be reduced. (default:False
) - check_reduction: when setting to
True
, it enables DistributedDataParallel - to automatically check if the previous iteration’s
backward reductions were successfully issued at the
beginning of every iteration’s forward function.
You normally don’t need this option enabled unless you
are observing weird behaviors such as different ranks
are getting different gradients, which should not
happen if DistributedDataParallel is correctly used.
(default:
False
)
Attributes: module (Module): the module to be parallelized
Example:
>>> torch.distributed.init_process_group(backend='nccl', world_size=4, init_method='...') >>> net.convert_to_distributed(pg) >>> # same thing >>> net.model = torch.nn.DistributedDataParallel(net.model, pg)
-
static
count_params
(proto_model: <Mock name='mock.Module' id='140586685951168'>)[source]¶ count the total parameters of model.
Parameters: proto_model – pytorch module Returns: number of parameters
-
define
(proto_model: <Mock name='mock.Module' id='140586685951168'>, gpu_ids_abs: Union[list, tuple], init_method: Union[str, function, None], show_structure: bool)[source]¶ Define and wrap a pytorch module, according to CPU, GPU and multi-GPUs.
- Print the module’s info.
- Move this module to specify device.
- Apply weight init method.
Parameters: - proto_model – Network, type of
module
. - gpu_ids_abs – Be used GPUs’ id, type of
tuple
orlist
. If not use GPU, pass()
. - init_method – init weights method(“kaiming”) or
False
don’t use any init. - show_structure – If print structure of model.
-
load_point
(model_name: str, epoch: int, logdir='log')[source]¶ load model and weights from a certain checkpoint.
this method is cooperate with method self.chechPoint()
-
load_weights
(weights: Union[dict, str], strict=True)[source]¶ Assemble a model and weights from paths or passing parameters.
You can load a model from a file, passing parameters or both.
Parameters: - weights – Pytorch weights or weights file path.
- strict – The same function in pytorch
model.load_state_dict(weights,strict = strict)
. default:True
Returns: module
Example:
>>> from torchvision.models.resnet import resnet18 >>> model = Model(resnet18()) ResNet Total number of parameters: 11689512 ResNet model use CPU! apply kaiming weight init! >>> model.save_weights("model.pth",) try to remove 'module.' in keys of weights dict... >>> model.load_weights("model.pth", True) Try to remove `moudle.` to keys of weights dict
-
print_network
(proto_model: <Mock name='mock.Module' id='140586685951168'>, show_structure=False)[source]¶ Print total number of parameters and structure of network
Parameters: - proto_model – Pytorch module
- show_structure – If show network’s structure. default:
False
Returns: Total number of parameters
-
save_weights
(weights_path: str, fix_weights=True)[source]¶ Save a model and weights to files.
You can save a model, weights or both to file.
Note
This method deal well with different devices on model saving. You don’ need to care about which devices your model have saved.
Parameters: - weights_path – Pytorch weights or weights file path.
- fix_weights – If this is true, it will remove the ‘.module’ in keys, when you save a
DataParallel
. without any moving operation. Otherwise, it will move to cpu, especially inDataParallel
. default:False
Example:
>>> from torch.nn import Linear >>> model = Model(Linear(10,1)) Linear Total number of parameters: 11 Linear model use CPU! apply kaiming weight init! >>> model.save_weights("weights.pth") try to remove 'module.' in keys of weights dict... >>> model.load_weights("weights.pth") Try to remove `moudle.` to keys of weights dict
jdit.optimizer¶
Optimizer¶
-
class
jdit.
Optimizer
(params: parameters of model, optimizer: [Adam,RMSprop,SGD...], lr_decay: float = 1.0, decay_position: Union[int, tuple, list] = -1, lr_reset: Dict[int, float] = None, position_type: ('epoch','step') = 'epoch', **kwargs)[source]¶ This is a wrapper of
optimizer
class in pytorch.We add something new features in order to feather control the optimizer.
params
is the parameters of model which need to be updated. It will use a filter to get all the parameters that required grad automatically. Like thisfilter(lambda p: p.requires_grad, params)
So, you can passing
model.all_params()
without any filters.learning rate decay
When callingdo_lr_decay()
, it will do a learning rate decay. like:\[lr = lr * decay\]learning rate reset
. Reset learning rate, it can change learning rate and decay directly.
Parameters: - params – parameters of model, which need to be updated.
- optimizer – An optimizer classin pytorch, such as
torch.optim.Adam
. - lr_decay – learning rate decay. Default: 0.92.
- decay_at_epoch – The position of applying lr decay. Default: None.
- decay_at_step – learning rate decay. Default: None
- kwargs – pass hyper-parameters to optimizer, such as
lr
,betas
,weight_decay
.
Returns: Args:
params (dict): parameters of model, which need to be updated.
optimizer (torch.optim.Optimizer): An optimizer classin pytorch, such as
torch.optim.Adam
lr_decay (float, optional): learning rate decay. Default: 0.92
decay_position (int, list, optional): The decaly position of lr. Default: None
lr_reset (Dict[position(int), lr(float)] ): Reset learning at a certain position. Default: None
position_type (‘epoch’,’step’): Position type. Default: None
**kwargs : pass hyper-parameters to optimizer, such as
lr
,betas
,weight_decay
.Example:
>>> from torch.nn import Sequential, Conv3d >>> from torch.optim import Adam >>> module = Sequential(Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0))) >>> opt = Optimizer(module.parameters() ,"Adam", 0.5, 10, {4:0.99},"epoch", lr=1.0, betas=(0.9, 0.999), weight_decay=1e-5) >>> print(opt) (Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) eps: 1e-08 lr: 1.0 weight_decay: 1e-05 ) lr_decay:0.5 decay_position:10 lr_reset:{4: 0.99} position_type:epoch )) >>> opt.lr 1.0 >>> opt.lr_decay 0.5 >>> opt.do_lr_decay() >>> opt.lr 0.5 >>> opt.do_lr_decay(reset_lr=1) >>> opt.lr 1 >>> opt.opt Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) eps: 1e-08 lr: 1 weight_decay: 1e-05 ) >>> opt.is_decay_lr(1) False >>> opt.is_decay_lr(10) True >>> opt.is_decay_lr(20) True >>> opt.is_reset_lr(4) 0.99 >>> opt.is_reset_lr(5) False
-
do_lr_decay
(reset_lr_decay: float = None, reset_lr: float = None)[source]¶ Do learning rate decay, or reset them.
- Passing parameters both None:
- Do a learning rate decay by
self.lr = self.lr * self.lr_decay
. - Passing parameters reset_lr_decay or reset_lr:
- Do a learning rate or decay reset. by
self.lr = reset_lr
self.lr_decay = reset_lr_decay
Parameters: - reset_lr_decay – if not None, use this value to reset self.lr_decay. Default: None.
- reset_lr – if not None, use this value to reset self.lr. Default: None.
Returns:
jdit.trainer¶
SupTrainer¶
-
class
jdit.trainer.
SupTrainer
(nepochs: int, logdir: str, gpu_ids_abs: Union[list, tuple] = ())[source]¶ this is a super class of all trainers
It defines: * The basic tools,
Performance()
,Watcher()
,Loger()
. * The basic loop of epochs. * Learning rate decay and model check point.-
debug
()[source]¶ Debug the trainer.
It will check the function
self._record_configs()
save all module’s configures.self.train_epoch()
train one epoch with several samples. So, it is vary fast.self.valid_epoch()
valid one epoch using dataset_valid.self._change_lr()
do learning rate change.self._check_point()
do model check point.self.test()
do test by using dataset_test.
Before debug, it will reset the
datasets
and only pick up several samples to do fast test. For test, it build alog_debug
directory to save the log.Returns: bool. It will return True
, if passes all the tests.
-
dist_train
(process_bar_header: str = None, process_bar_position: int = None, subbar_disable=False, record_configs=True, show_network=False, **kwargs)[source]¶ The main training loop of epochs.
Parameters: - process_bar_header – The tag name of process bar header,
which is used in
tqdm(desc=process_bar_header)
- process_bar_position – The process bar’s position. It is useful in multitask,
which is used in
tqdm(position=process_bar_position)
- subbar_disable – If show the info of every training set,
- record_configs – If record the training processing data.
- show_network – If show the structure of network. It will cost extra memory,
- kwargs – Any other parameters that passing to
tqdm()
to control the behavior of process bar.
- process_bar_header – The tag name of process bar header,
which is used in
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140586685492360'>)[source]¶ Split your data from one batch data to specify . If your dataset return something like
return input_data, label
.It means that two values need unpack. So, you need to split the batch data into two parts, like this
input, ground_truth = batch_data[0], batch_data[1]
Caution
Don’t forget to move these data to device, by using
input.to(device)
.Parameters: - batch_data – One batch data from dataloader.
- device – the device that data will be located.
Returns: The certain variable with correct device location.
Example:
# load and unzip the data from one batch tuple (input, ground_truth) input, ground_truth = batch_data[0], batch_data[1] # move these data to device return input.to(device), ground_truth.to(device)
-
plot_graphs_lazy
()[source]¶ Plot model graph on tensorboard. To plot all models graphs in trainer, by using variable name as model name.
Returns:
-
train
(process_bar_header: str = None, process_bar_position: int = None, subbar_disable=False, record_configs=True, show_network=False, **kwargs)[source]¶ The main training loop of epochs.
Parameters: - process_bar_header – The tag name of process bar header,
which is used in
tqdm(desc=process_bar_header)
- process_bar_position – The process bar’s position. It is useful in multitask,
which is used in
tqdm(position=process_bar_position)
- subbar_disable – If show the info of every training set,
- record_configs – If record the training processing data.
- show_network – If show the structure of network. It will cost extra memory,
- kwargs – Any other parameters that passing to
tqdm()
to control the behavior of process bar.
- process_bar_header – The tag name of process bar header,
which is used in
-
train_epoch
(subbar_disable=False)[source]¶ You get train loader and do a loop to deal with data.
Caution
You must record your training step on
self.step
in your loop by doing things like thisself.step += 1
.Example:
for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1)): self.step += 1 self.input_cpu, self.ground_truth_cpu = self.get_data_from_batch(batch, self.device) self._train_iteration(self.opt, self.compute_loss, tag="Train")
Returns:
-
Single Model Trainer¶
SupSingleModelTrainer¶
-
class
jdit.trainer.
SupSingleModelTrainer
(logdir, nepochs, gpu_ids_abs, net: jdit.model.Model, opt: jdit.optimizer.Optimizer, datasets: jdit.dataset.DataLoadersFactory)[source]¶ This is a Single Model Trainer. It means you only have one model.
input, gound_truth output = model(input) loss(output, gound_truth)-
compute_loss
() -> (<Mock name='mock.Tensor' id='140586680998992'>, <class 'dict'>)[source]¶ Rewrite this method to compute your own loss Discriminator. Use self.input, self.output and self.ground_truth to compute loss. You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.likeExample:
var_dic = {} var_dic["LOSS"] = loss_d = (self.output ** 2 - self.groundtruth ** 2) ** 0.5 return: loss, var_dic
-
compute_valid
() → dict[source]¶ Rewrite this method to compute your validation values. Use self.input, self.output and self.ground_truth to compute valid loss. You can return a
dict
of validation values that you want to visualize.Example:
# It will do the same thing as ``compute_loss()`` var_dic, _ = self.compute_loss() return var_dic
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140586685492360'>)[source]¶ Load and wrap data from the data lodaer.
Split your one batch data to specify variable.
Example:
# batch_data like this [input_Data, ground_truth_Data] input_cpu, ground_truth_cpu = batch_data[0], batch_data[1] # then move them to device and return them return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters: - batch_data – one batch data load from
DataLoader
- device – A device variable.
torch.device
Returns: input Tensor, ground_truth Tensor
- batch_data – one batch data load from
-
train_epoch
(subbar_disable=False)[source]¶ You get train loader and do a loop to deal with data.
Caution
You must record your training step on
self.step
in your loop by doing things like thisself.step += 1
.Example:
for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1)): self.step += 1 self.input_cpu, self.ground_truth_cpu = self.get_data_from_batch(batch, self.device) self._train_iteration(self.opt, self.compute_loss, tag="Train")
Returns:
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.net.eval() for iteration, batch in enumerate(self.datasets.loader_valid, 1):
self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) with torch.no_grad():
self.output = self.net(self.input) dic: dict = self.compute_valid()- if avg_dic == {}:
- avg_dic: dict = dic
- else:
- for key in dic.keys():
- avg_dic[key] += dic[key]
- for key in avg_dic.keys():
- avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid
self.watcher.scalars(avg_dic, self.step, tag=”Valid”) self.loger.write(self.step, self.current_epoch, avg_dic, “Valid”, header=self.step <= 1) self._watch_images(tag=”Valid”) self.net.train()
-
ClassificationTrainer¶
-
class
jdit.trainer.
ClassificationTrainer
(logdir, nepochs, gpu_ids, net, opt, datasets, num_class)[source]¶ this is a classification trainer.
-
compute_loss
()[source]¶ Compute the main loss and observed values.
Compute the loss and other values shown in tensorboard scalars visualization. You should return a main loss for doing backward propagation.
So, if you want some values visualized. Make a
dict()
with key name is the variable’s name. The training logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.output = self.net(self.input) self._train_iteration(self.opt, self.compute_loss, csv_filename=”Train”)So, you have self.net, self.input, self.output, self.ground_truth to compute your own loss here.
Note
Only the main loss will do backward propagation, which is the first returned variable. If you have the joint loss, please add them up and return one main loss.
Note
All of your variables in returned
dict()
will never do backward propagation withmodel.train()
. However, It still compute grads, without usingwith torch.autograd.no_grad()
. So, you can compute any grads variables for visualization.Example:
var_dic = {} labels = self.ground_truth.squeeze().long() var_dic["MSE"] = loss = nn.MSELoss()(self.output, labels) return loss, var_dic
-
compute_valid
()[source]¶ Compute the valid_epoch variables for visualization.
Compute the validations. For the validations will only be used in tensorboard scalars visualization. So, if you want some variables visualized. Make a
dict()
with key name is the variable’s name. You have self.net, self.input, self.output, self.ground_truth to compute your own validations here.Note
All of your variables in returned
dict()
will never do backward propagation withmodel.eval()
. However, It still compute grads, without usingwith torch.autograd.no_grad()
. So, you can compute some grads variables for visualization.- Example::
- var_dic = {} labels = self.ground_truth.squeeze().long() var_dic[“CEP”] = nn.CrossEntropyLoss()(self.output, labels) return var_dic
-
get_data_from_batch
(batch_data, device)[source]¶ If you have different behavior. You need to rewrite thisd method and the method sllf.train_epoch()
Parameters: - batch_data – A Tensor loads from dataset
- device – compute device
Returns: Tensors,
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.net.eval() for iteration, batch in enumerate(self.datasets.loader_valid, 1):
self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) with torch.no_grad():
self.output = self.net(self.input) dic: dict = self.compute_valid()- if avg_dic == {}:
- avg_dic: dict = dic
- else:
- for key in dic.keys():
- avg_dic[key] += dic[key]
- for key in avg_dic.keys():
- avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid
self.watcher.scalars(avg_dic, self.step, tag=”Valid”) self.loger.write(self.step, self.current_epoch, avg_dic, “Valid”, header=self.step <= 1) self._watch_images(tag=”Valid”) self.net.train()
-
AutoEncoderTrainer¶
-
class
jdit.trainer.
AutoEncoderTrainer
(logdir, nepochs, gpu_ids, net, opt, datasets)[source]¶ this is a autoencoder-decoder trainer. Image to Image
-
compute_loss
()[source]¶ Compute the main loss and observed values.
Compute the loss and other values shown in tensorboard scalars visualization. You should return a main loss for doing backward propagation.
So, if you want some values visualized. Make a
dict()
with key name is the variable’s name. The training logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.output = self.net(self.input) self._train_iteration(self.opt, self.compute_loss, csv_filename=”Train”)So, you have self.net, self.input, self.output, self.ground_truth to compute your own loss here.
Note
Only the main loss will do backward propagation, which is the first returned variable. If you have the joint loss, please add them up and return one main loss.
Note
All of your variables in returned
dict()
will never do backward propagation withmodel.train()
. However, It still compute grads, without usingwith torch.autograd.no_grad()
. So, you can compute any grads variables for visualization.Example:
var_dic = {} var_dic["CEP"] = loss = nn.MSELoss(reduction="mean")(self.output, self.ground_truth) return loss, var_dic
-
compute_valid
()[source]¶ Compute the valid_epoch variables for visualization.
Compute the caring variables. For the caring variables will only be used in tensorboard scalars visualization. So, if you want some variables visualized. Make a
dict()
with key name is the variable’s name.Note
All of your variables in returned
dict()
will never do backward propagation withmodel.eval()
. However, It still compute grads, without usingwith torch.autograd.no_grad()
. So, you can compute some grads variables for visualization.- Example::
- var_dic = {} var_dic[“CEP”] = loss = nn.MSELoss(reduction=”mean”)(self.output, self.ground_truth) return var_dic
-
get_data_from_batch
(batch_data, device)[source]¶ If you have different behavior. You need to rewrite thisd method and the method sllf.train_epoch()
Parameters: - batch_data – A Tensor loads from dataset
- device – compute device
Returns: Tensors,
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.net.eval() for iteration, batch in enumerate(self.datasets.loader_valid, 1):
self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) with torch.no_grad():
self.output = self.net(self.input) dic: dict = self.compute_valid()- if avg_dic == {}:
- avg_dic: dict = dic
- else:
- for key in dic.keys():
- avg_dic[key] += dic[key]
- for key in avg_dic.keys():
- avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid
self.watcher.scalars(avg_dic, self.step, tag=”Valid”) self.loger.write(self.step, self.current_epoch, avg_dic, “Valid”, header=self.step <= 1) self._watch_images(tag=”Valid”) self.net.train()
-
Generative Adversarial Networks Trainer¶
SupGanTrainer¶
-
class
jdit.trainer.
SupGanTrainer
(logdir, nepochs, gpu_ids_abs, netG: jdit.model.Model, netD: jdit.model.Model, optG: jdit.optimizer.Optimizer, optD: jdit.optimizer.Optimizer, datasets: jdit.dataset.DataLoadersFactory)[source]¶ -
compute_d_loss
() -> (<Mock name='mock.Tensor' id='140586680998992'>, <class 'dict'>)[source]¶ Rewrite this method to compute your own loss Discriminator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.likeExample:
d_fake = self.netD(self.fake.detach()) d_real = self.netD(self.ground_truth) var_dic = {} var_dic["GP"] = gp = gradPenalty(self.netD, self.ground_truth, self.fake, input=self.input, use_gpu=self.use_gpu) var_dic["WD"] = w_distance = (d_real.mean() - d_fake.mean()).detach() var_dic["LOSS_D"] = loss_d = d_fake.mean() - d_real.mean() + gp + sgp return: loss_d, var_dic
-
compute_g_loss
() -> (<Mock name='mock.Tensor' id='140586680998992'>, <class 'dict'>)[source]¶ Rewrite this method to compute your own loss of Generator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.likeExample:
d_fake = self.netD(self.fake) var_dic = {} var_dic["JC"] = jc = jcbClamp(self.netG, self.input, use_gpu=self.use_gpu) var_dic["LOSS_D"] = loss_g = -d_fake.mean() + jc return: loss_g, var_dic
-
compute_valid
() → dict[source]¶ Rewrite this method to compute your validation values.
You can return a
dict
of validation values that you want to visualize.Example:
# It will do the same thing as ``compute_g_loss()`` and ``self.compute_d_loss()`` g_loss, _ = self.compute_g_loss() d_loss, _ = self.compute_d_loss() var_dic = {"LOSS_D": d_loss, "LOSS_G": g_loss} return var_dic
-
d_turn
= 1¶ The training times of Discriminator every ones Generator training.
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140586685492360'>)[source]¶ Load and wrap data from the data lodaer.
Split your one batch data to specify variable.
Example:
# batch_data like this [input_Data, ground_truth_Data] input_cpu, ground_truth_cpu = batch_data[0], batch_data[1] # then move them to device and return them return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters: - batch_data – one batch data load from
DataLoader
- device – A device variable.
torch.device
Returns: input Tensor, ground_truth Tensor
- batch_data – one batch data load from
-
train_epoch
(subbar_disable=False)[source]¶ You get train loader and do a loop to deal with data.
Caution
You must record your training step on
self.step
in your loop by doing things like thisself.step += 1
.Example:
for iteration, batch in tqdm(enumerate(self.datasets.loader_train, 1)): self.step += 1 self.input_cpu, self.ground_truth_cpu = self.get_data_from_batch(batch, self.device) self._train_iteration(self.opt, self.compute_loss, tag="Train")
Returns:
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.netG.eval() self.netD.eval() # Load data from loader_valid. for iteration, batch in enumerate(self.datasets.loader_valid, 1): self.input, self.ground_truth = self.get_data_from_batch(batch) with torch.no_grad(): self.fake = self.netG(self.input) # You can write this function to apply your computation. dic: dict = self.compute_valid() if avg_dic == {}: avg_dic: dict = dic else: for key in dic.keys(): avg_dic[key] += dic[key] for key in avg_dic.keys(): avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid self.watcher.scalars(avg_dic, self.step, tag="Valid") self._watch_images(tag="Valid") self.netG.train() self.netD.train()
-
Pix2pixGanTrainer¶
-
class
jdit.trainer.
Pix2pixGanTrainer
(logdir, nepochs, gpu_ids_abs, netG, netD, optG, optD, datasets)[source]¶ -
compute_d_loss
()[source]¶ Rewrite this method to compute your own loss Discriminator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.like The training logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:
d_fake = self.netD(self.fake.detach()) d_real = self.netD(self.ground_truth) var_dic = {} var_dic["LS_LOSSD"] = loss_d = 0.5 * (torch.mean((d_real - 1) ** 2) + torch.mean(d_fake ** 2)) return loss_d, var_dic
-
compute_g_loss
()[source]¶ Rewrite this method to compute your own loss of Generator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.like The training logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:
d_fake = self.netD(self.fake, self.input) var_dic = {} var_dic["LS_LOSSG"] = loss_g = 0.5 * torch.mean((d_fake - 1) ** 2) return loss_g, var_dic
-
compute_valid
()[source]¶ Rewrite this method to compute valid_epoch values.
You can return a
dict
of values that you want to visualize.Note
This method is under
torch.no_grad():
. So, it will never compute grad. If you want to compute grad, please usetorch.enable_grad():
to wrap your operations.Example:
d_fake = self.netD(self.fake.detach()) d_real = self.netD(self.ground_truth) var_dic = {} var_dic["WD"] = w_distance = (d_real.mean() - d_fake.mean()).detach() return var_dic
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140586685492360'>)[source]¶ Load and wrap data from the data lodaer.
Split your one batch data to specify variable.
Example:
# batch_data like this [input_Data, ground_truth_Data] input_cpu, ground_truth_cpu = batch_data[0], batch_data[1] # then move them to device and return them return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters: - batch_data – one batch data load from
DataLoader
- device – A device variable.
torch.device
Returns: input Tensor, ground_truth Tensor
- batch_data – one batch data load from
-
test
()[source]¶ Test your model when you finish all epochs.
This method will call when all epochs finish.
Example:
for index, batch in enumerate(self.datasets.loader_test, 1): # For test only have input without groundtruth input = batch.to(self.device) self.netG.eval() with torch.no_grad(): fake = self.netG(input) self.watcher.image(fake, self.current_epoch, tag="Test/fake", grid_size=(4, 4), shuffle=False) self.netG.train()
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.netG.eval() self.netD.eval() # Load data from loader_valid. for iteration, batch in enumerate(self.datasets.loader_valid, 1): self.input, self.ground_truth = self.get_data_from_batch(batch) with torch.no_grad(): self.fake = self.netG(self.input) # You can write this function to apply your computation. dic: dict = self.compute_valid() if avg_dic == {}: avg_dic: dict = dic else: for key in dic.keys(): avg_dic[key] += dic[key] for key in avg_dic.keys(): avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid self.watcher.scalars(avg_dic, self.step, tag="Valid") self._watch_images(tag="Valid") self.netG.train() self.netD.train()
-
GenerateGanTrainer¶
-
class
jdit.trainer.
GenerateGanTrainer
(logdir, nepochs, gpu_ids_abs, netG, netD, optG, optD, datasets, latent_shape)[source]¶ -
compute_d_loss
()[source]¶ Rewrite this method to compute your own loss Discriminator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.like The train logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:
d_fake = self.netD(self.fake.detach()) d_real = self.netD(self.ground_truth) var_dic = {} var_dic["LS_LOSSD"] = loss_d = 0.5 * (torch.mean((d_real - 1) ** 2) + torch.mean(d_fake ** 2)) return loss_d, var_dic
-
compute_g_loss
()[source]¶ Rewrite this method to compute your own loss of Generator.
You should return a loss for the first position. You can return a
dict
of loss that you want to visualize on the second position.like The train logic is :self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute loss. Example:
d_fake = self.netD(self.fake, self.input) var_dic = {} var_dic["LS_LOSSG"] = loss_g = 0.5 * torch.mean((d_fake - 1) ** 2) return loss_g, var_dic
-
compute_valid
()[source]¶ The train logic is : self.input, self.ground_truth = self.get_data_from_batch(batch, self.device) self.fake = self.netG(self.input) self._train_iteration(self.optD, self.compute_d_loss, csv_filename=”Train_D”) if (self.step % self.d_turn) == 0:
self._train_iteration(self.optG, self.compute_g_loss, csv_filename=”Train_G”)So, you use self.input , self.ground_truth, self.fake, self.netG, self.optD to compute validations.
Returns:
-
d_turn
= 1¶ The training times of Discriminator every ones Generator training.
-
get_data_from_batch
(batch_data: list, device: <Mock name='mock.device' id='140586685492360'>)[source]¶ Load and wrap data from the data lodaer.
Split your one batch data to specify variable.
Example:
# batch_data like this [input_Data, ground_truth_Data] input_cpu, ground_truth_cpu = batch_data[0], batch_data[1] # then move them to device and return them return input_cpu.to(self.device), ground_truth_cpu.to(self.device)
Parameters: - batch_data – one batch data load from
DataLoader
- device – A device variable.
torch.device
Returns: input Tensor, ground_truth Tensor
- batch_data – one batch data load from
-
valid_epoch
()[source]¶ Validate model each epoch.
It will be called each epoch, when training finish. So, do same verification here.
Example:
avg_dic: dict = {} self.netG.eval() self.netD.eval() # Load data from loader_valid. for iteration, batch in enumerate(self.datasets.loader_valid, 1): self.input, self.ground_truth = self.get_data_from_batch(batch) with torch.no_grad(): self.fake = self.netG(self.input) # You can write this function to apply your computation. dic: dict = self.compute_valid() if avg_dic == {}: avg_dic: dict = dic else: for key in dic.keys(): avg_dic[key] += dic[key] for key in avg_dic.keys(): avg_dic[key] = avg_dic[key] / self.datasets.nsteps_valid self.watcher.scalars(avg_dic, self.step, tag="Valid") self._watch_images(tag="Valid") self.netG.train() self.netD.train()
-
jdit.assessment¶
FID¶
-
jdit.assessment.
FID_score
(source, target, sample_prop=1.0, gpu_ids=(), dim=2048, batchsize=128, verbose=True)[source]¶ Compute FID score from
Tensor
,DataLoader
or a directory``path``.Parameters: - source – source data.
- target – target data.
- sample_prop – If passing a
Tensor
source, set this rate to sample a part of data from source. - gpu_ids – gpu ids.
- dim –
The number of features. Three options available.
- 64: The first max pooling features of Inception.
- 192: The Second max pooling features of Inception.
- 768: The Pre-aux classifier features of Inception.
- 2048: The Final average pooling features of Inception.
Default: 2048.
- batchsize – Only using for passing paths of source and target.
- verbose – If show processing log.
Returns: fid score
Attention
If you are passing
Tensor
as source and target. Make sure you have enough memory to load these data in _InceptionV3. Otherwise, please passingpath
ofDataLoader
to compute them step by step.Example:
>>> from jdit.dataset import Cifar10 >>> loader = Cifar10(root=r"../../datasets/cifar10", batch_shape=(32, 3, 32, 32)) >>> target_tensor = loader.samples_train[0] >>> source_tensor = loader.samples_valid[0] >>> # using Tensor to compute FID score >>> fid_value = FID_score(source_tensor, target_tensor, sample_prop=0.01, depth=768) >>> print('FID: ', fid_value) >>> # using DataLoader to compute FID score >>> fid_value = FID_score(loader.loader_test, loader.loader_valid, depth=768) >>> print('FID: ', fid_value)
jdit.parallel¶
SupParallelTrainer¶
-
class
jdit.parallel.
SupParallelTrainer
(unfixed_params_list: list, train_func=None)[source]¶ Training parallel
Parameters: - default_params – a
dict()
like{param_1:d1, param_2:d2 ...}
- unfixed_params_list – a
list
like[{param_1:a1, param_2:a2}, {param_1:b1, param_2:b2}, ...]
.
Note
You must set the value of
task_id
andgpu_ids_abs
, regardless indefault_params
orunfixed_params_list
.{'task_id': 1`}
,{'gpu_ids_abs': [0,1]}
.- For the same
task_id
, the tasks will be executed sequentially on the certain devices. - For the different
task_id
, the will be executed parallelly on the certain devices.
Example:
unfixed_params_list = [ {'task_id':1, 'lr':1e-3,'gpu_ids_abs': [0] }, {'task_id':1, 'lr':1e-4,'gpu_ids_abs': [0] }, {'task_id':2, 'lr':1e-5,'gpu_ids_abs': [2,3] }]
This set of
unfixed_params_list
means that:time ‘task_id’:1 ‘task_id’:2 t ‘lr’:1e-3, ‘gpu_ids_abs’: [0] ‘lr’:1e-5, ‘gpu_ids_abs’: [2,3] executed parallelly t+1 ‘lr’:1e-4, ‘gpu_ids_abs’: [0] executed sequentially -
build_task_trainer
(unfixed_params: dict)[source]¶ You need to write this method to build your own
Trainer
.This will run in a certain subprocess. The keys of
params
are compatible withdataset
,Model
,Optimizer
andTrainer
. You can see parameters in the following example.These two parameters are special.
params["logdir"]
controls the log directory.params["gpu_ids_abs"]
controls the running devices.
You should return a
Trainer
when you finish you building.Parameters: params – parameters dictionary. Returns: Trainer Example:
# Using ``params['key']`` to build your Trainer. logdir = params["logdir"] # necessary! gpu_ids_abs = params["gpu_ids_abs"] # necessary! use_benchmark = params["use_benchmark"] data_root = params["data_root"] batch_shape = params["batch_shape"] opt_name = params["opt_name"] lr = params["lr"] lr_decay = params["lr_decay"] lr_minimum = params["lr_minimum"] weight_decay = params["weight_decay"] momentum = params["momentum"] betas = params["betas"] init_method = params["init_method"] depth = params["depth"] mid_channels = params["mid_channels"] nepochs = params["nepochs"] torch.backends.cudnn.benchmark = use_benchmark mnist = FashionMNIST(root=data_root, batch_shape=batch_shape) T_net = Model(Tresnet18(depth=depth, mid_channels=mid_channels), gpu_ids_abs=gpu_ids_abs, init_method=init_method) opt = Optimizer(T_net.parameters(), lr, lr_decay, weight_decay, momentum, betas, opt_name, lr_minimum=lr_minimum) Trainer = FashingClassTrainer(logdir, nepochs, gpu_ids_abs, T_net, opt, mnist) # You must return a Trainer! return Trainer
-
error
(msg)[source]¶ When a subprocess failed, it will be called.
You can rewrite this method for your purpose. :param msg: error massage
- default_params – a
Jdit is a research processing oriented framework based on pytorch. Only care about your ideas. You don’t need to build a long boring code to run a deep learning project to verify your ideas.
You only need to implement you ideas and don’t do anything with training framework, multiply-gpus, checkpoint, process visualization, performance evaluation and so on.
Quick start¶
After building and installing jdit package, you can make a new directory for a quick test. Assuming that you get a new directory example. run this code in ipython cmd.(Create a main.py file is also acceptable.)
from jdit.trainer.instances.fashionClassification
import start_fashionClassTrainer
start_fashionClassTrainer()
Then you will see something like this as following.
===> Build dataset
use 8 thread
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Processing...
Done
===> Building model
ResNet Total number of parameters: 2776522
ResNet model use CPU
apply kaiming weight init
===> Building optimizer
===> Training
using `tensorboard --logdir=log` to see learning curves and net structure.
training and valid_epoch data, configures info and checkpoint were save in `log` directory.
0%| | 0/10 [00:00<.., ..epoch/s]
0step [00:00, step/s]
- It will search a fashion mnist dataset.
- Then build a resnet18 for classification.
- For training process, you can find learning curves in tensorboard.
- It will create a log directory in example/, which saves training processing data and configures.
Although it is just an example, you still can build your own project easily by using jdit framework. Jdit framework can deal with * Data visualization. (learning curves, images in pilot process) * CPU, GPU or GPUs. (Training your model on specify devices) * Intermediate data storage. (Saving training data into a csv file) * Model checkpoint automatically. * Flexible templates can be used to integrate and custom overrides. So, let’s see what is jdit.