What is the difference between `torch.nn.Parameter` and a `torch.Tensor`?

Q: What is the difference between `torch.nn.Parameter` and a `torch.Tensor`?

Learn the answer to "What is the difference between `torch.nn.Parameter` and a `torch.Tensor`?" with detailed explanations, code examples, and best practices on DeployU.

The Scenario

You are building a custom nn.Module in PyTorch and need to define the weights of the layer. You are not sure whether to use a torch.Tensor or a torch.nn.Parameter to store the weights.

The Challenge

Explain the difference between a torch.nn.Parameter and a torch.Tensor. When would you use one over the other?

Wrong Approach

A junior engineer might think that they are the same thing. They might not be aware of the fact that `torch.nn.Parameter` is a special type of tensor that is automatically registered as a parameter of a module.

Addresses symptoms, not root cause

Right Approach

A senior engineer would know that a `torch.nn.Parameter` is a special type of tensor that is automatically registered as a parameter of a module when it is assigned as an attribute of a module. They would be able to explain that this is important because it allows the parameters of a model to be easily accessed by the optimizer.

`torch.Tensor` vs. `torch.nn.Parameter`

A torch.nn.Parameter is a subclass of torch.Tensor that has a special property: when it is assigned as an attribute of a nn.Module, it is automatically added to the list of the module’s parameters.

Feature	`torch.Tensor`	`torch.nn.Parameter`
Registration	Not automatically registered as a parameter of a module.	Automatically registered as a parameter of a module when assigned as an attribute.
`requires_grad`	`False` by default.	`True` by default.
Purpose	Used for storing data that is not a parameter of a model.	Used for storing the trainable weights of a model.

When to use `torch.nn.Parameter`

You should use torch.nn.Parameter to store the trainable weights of a nn.Module. This is because it makes it easy to access all the parameters of a model, which is necessary for training the model with an optimizer.

Example:

import torch.nn as nn

class MyLayer(nn.Module):
    def __init__(self):
        super(MyLayer, self).__init__()
        self.my_weights = nn.Parameter(torch.randn(10, 20))
        self.my_bias = nn.Parameter(torch.zeros(20))

    def forward(self, x):
        return torch.matmul(x, self.my_weights) + self.my_bias

layer = MyLayer()

# The parameters are automatically added to the list of the layer's parameters
for name, param in layer.named_parameters():
    print(name, param.size())
# my_weights torch.Size([10, 20])
# my_bias torch.Size([20])

If you were to use a torch.Tensor instead of a torch.nn.Parameter, the weights would not be added to the list of the layer’s parameters, and they would not be updated during training.

Systematic, production-ready debugging

Practice Question

You are building a custom layer and need to store a buffer that is not a parameter of the model, but should be moved to the GPU along with the model. What should you do?

Questions