RotateOnly
- class rotograd.RotateOnly(backbone, heads, rotation_shape, *args, post_shape=(), normalize_losses=False, burn_in_period=20)[source]
Bases:
Module
Implementation of the rotating part of RotoGrad as described in the original paper. [1]
The module takes as input a vector of shape … x rotation_shape x
- Parameters:
backbone (
Module
) – Shared module.heads (
Sequence
[Module
]) – Task-specific modules.rotation_shape (
Union
[int
,Size
]) – Shape of the shared representation to be rotated which, usually, is just the size of the backbone’s output. Passing a shape is useful, for example, if you want to rotate an image with shape width x height.post_shape (optional, default=()) – Shape of the shared representation following the part to be rotated (if any). This part will be kept as it is. This is useful, for example, if you want to rotate only the channels of an image.
normalize_losses (optional, default=False) – Whether to use this normalized losses to back-propagate through the task-specific parameters as well.
burn_in_period (optional, default=20) – When back-propagating towards the shared parameters, each task loss is normalized dividing by its initial value, \({L_k(t)}/{L_k(t_0 = 0)}\). This parameter sets a number of iterations after which the denominator will be replaced by the value of the loss at that iteration, that is, \(t_0 = burn\_in\_period\). This is done to overcome problems with losses quickly changing in the first iterations.
- num_tasks
Number of tasks/heads of the module.
- backbone
Shared module.
- heads
Sequence with the (rotated) task-specific heads.
- rep
Current output of the backbone (after calling forward during training).
References
- backward(losses, backbone_loss=None, **kwargs)[source]
Computes the backward computations for the entire model (that is, shared and specific modules). It also computes the gradients for the rotation matrices.
- Parameters:
losses (
Sequence
[Tensor
]) – Sequence of the task losses from which back-propagate.backbone_loss – Loss exclusive for the backbone (for example, a regularization term).
- Return type:
None
- forward(x)[source]
Forwards the input x through the backbone and all heads, returning a list with all the task predictions. It can be thought as something similar to:
preds = [] z = backbone(x) for R_i, head in zip(rotations, heads): z_i = rotate(R_i, z) preds.append(head(z_i)) return preds
- Return type:
Sequence
[Any
]
- property rotation: Sequence[Tensor]
List of rotations matrices, one per task. These are trainable, make sure to call detach().
- to(*args, **kwargs)[source]
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)[source]
- to(dtype, non_blocking=False)[source]
- to(tensor, non_blocking=False)[source]
- to(memory_format=torch.channels_last)[source]
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point or complexdtype
s. In addition, this method will only cast the floating point or complex parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Parameters:
device (
torch.device
) – the desired device of the parameters and buffers in this moduledtype (
torch.dtype
) – the desired floating point or complex dtype of the parameters and buffers in this moduletensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (
torch.memory_format
) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- Returns:
self
- Return type:
Module
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- train(mode=True)[source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module