DeployU
Interviews / AI & MLOps / What is the difference between `torch.nn.Parameter` and a `torch.Tensor`?

What is the difference between `torch.nn.Parameter` and a `torch.Tensor`?

conceptual Core Concepts Interactive Quiz Code Examples

The Scenario

You are building a custom nn.Module in PyTorch and need to define the weights of the layer. You are not sure whether to use a torch.Tensor or a torch.nn.Parameter to store the weights.

The Challenge

Explain the difference between a torch.nn.Parameter and a torch.Tensor. When would you use one over the other?

Wrong Approach

A junior engineer might think that they are the same thing. They might not be aware of the fact that `torch.nn.Parameter` is a special type of tensor that is automatically registered as a parameter of a module.

Right Approach

A senior engineer would know that a `torch.nn.Parameter` is a special type of tensor that is automatically registered as a parameter of a module when it is assigned as an attribute of a module. They would be able to explain that this is important because it allows the parameters of a model to be easily accessed by the optimizer.

torch.Tensor vs. torch.nn.Parameter

A torch.nn.Parameter is a subclass of torch.Tensor that has a special property: when it is assigned as an attribute of a nn.Module, it is automatically added to the list of the module’s parameters.

Featuretorch.Tensortorch.nn.Parameter
RegistrationNot automatically registered as a parameter of a module.Automatically registered as a parameter of a module when assigned as an attribute.
requires_gradFalse by default.True by default.
PurposeUsed for storing data that is not a parameter of a model.Used for storing the trainable weights of a model.

When to use torch.nn.Parameter

You should use torch.nn.Parameter to store the trainable weights of a nn.Module. This is because it makes it easy to access all the parameters of a model, which is necessary for training the model with an optimizer.

Example:

import torch.nn as nn

class MyLayer(nn.Module):
    def __init__(self):
        super(MyLayer, self).__init__()
        self.my_weights = nn.Parameter(torch.randn(10, 20))
        self.my_bias = nn.Parameter(torch.zeros(20))

    def forward(self, x):
        return torch.matmul(x, self.my_weights) + self.my_bias

layer = MyLayer()

# The parameters are automatically added to the list of the layer's parameters
for name, param in layer.named_parameters():
    print(name, param.size())
# my_weights torch.Size([10, 20])
# my_bias torch.Size([20])

If you were to use a torch.Tensor instead of a torch.nn.Parameter, the weights would not be added to the list of the layer’s parameters, and they would not be updated during training.

Practice Question

You are building a custom layer and need to store a buffer that is not a parameter of the model, but should be moved to the GPU along with the model. What should you do?