Questions
What is the difference between `torch.nn.Parameter` and a `torch.Tensor`?
The Scenario
You are building a custom nn.Module in PyTorch and need to define the weights of the layer. You are not sure whether to use a torch.Tensor or a torch.nn.Parameter to store the weights.
The Challenge
Explain the difference between a torch.nn.Parameter and a torch.Tensor. When would you use one over the other?
A junior engineer might think that they are the same thing. They might not be aware of the fact that `torch.nn.Parameter` is a special type of tensor that is automatically registered as a parameter of a module.
A senior engineer would know that a `torch.nn.Parameter` is a special type of tensor that is automatically registered as a parameter of a module when it is assigned as an attribute of a module. They would be able to explain that this is important because it allows the parameters of a model to be easily accessed by the optimizer.
torch.Tensor vs. torch.nn.Parameter
A torch.nn.Parameter is a subclass of torch.Tensor that has a special property: when it is assigned as an attribute of a nn.Module, it is automatically added to the list of the module’s parameters.
| Feature | torch.Tensor | torch.nn.Parameter |
|---|---|---|
| Registration | Not automatically registered as a parameter of a module. | Automatically registered as a parameter of a module when assigned as an attribute. |
requires_grad | False by default. | True by default. |
| Purpose | Used for storing data that is not a parameter of a model. | Used for storing the trainable weights of a model. |
When to use torch.nn.Parameter
You should use torch.nn.Parameter to store the trainable weights of a nn.Module. This is because it makes it easy to access all the parameters of a model, which is necessary for training the model with an optimizer.
Example:
import torch.nn as nn
class MyLayer(nn.Module):
def __init__(self):
super(MyLayer, self).__init__()
self.my_weights = nn.Parameter(torch.randn(10, 20))
self.my_bias = nn.Parameter(torch.zeros(20))
def forward(self, x):
return torch.matmul(x, self.my_weights) + self.my_bias
layer = MyLayer()
# The parameters are automatically added to the list of the layer's parameters
for name, param in layer.named_parameters():
print(name, param.size())
# my_weights torch.Size([10, 20])
# my_bias torch.Size([20])If you were to use a torch.Tensor instead of a torch.nn.Parameter, the weights would not be added to the list of the layer’s parameters, and they would not be updated during training.
Practice Question
You are building a custom layer and need to store a buffer that is not a parameter of the model, but should be moved to the GPU along with the model. What should you do?