What is the difference between a `tf.Variable` and a `tf.Tensor`?

Q: What is the difference between a `tf.Variable` and a `tf.Tensor`?

Learn the answer to "What is the difference between a `tf.Variable` and a `tf.Tensor`?" with detailed explanations, code examples, and best practices on DeployU.

The Scenario

You are an ML engineer at a robotics company. You are debugging a custom Keras layer that is not behaving as expected. The layer is supposed to maintain an internal state that is updated on each forward pass, but the state is not being updated correctly.

Here is the code for the layer:

import tensorflow as tf

class MyLayer(tf.keras.layers.Layer):
    def __init__(self):
        super(MyLayer, self).__init__()
        self.my_state = tf.zeros(shape=(1,))

    def call(self, inputs):
        self.my_state = self.my_state + tf.reduce_sum(inputs)
        return inputs

You have written a test case to check the behavior of the layer, but it is failing.

layer = MyLayer()
x = tf.constant([[1, 2, 3]])
y = layer(x)
print(layer.my_state.numpy()) # Expected: [6.], Actual: [0.]

The Challenge

Explain why the layer is not behaving as expected. What is the difference between a tf.Variable and a tf.Tensor, and how does this relate to the problem? How would you fix the layer?

Wrong Approach

A junior engineer might be confused about the difference between `tf.Variable` and `tf.Tensor`. They might try to debug the problem by adding `print` statements to the `call` method, but this would not help them to understand the root cause of the problem.

Addresses symptoms, not root cause

Right Approach

A senior engineer would immediately recognize that the problem is with the use of a `tf.Tensor` to store the layer's state. They would be able to explain that `tf.Tensor` is immutable and that a `tf.Variable` should be used instead. They would also know how to use the `assign` method to update the value of a `tf.Variable`.

Step 1: Understand the Core Problem: Mutability

The root cause of the problem is that tf.Tensor is immutable. This means that once you create a tf.Tensor, you cannot change its value. In the call method of the MyLayer class, the line self.my_state = self.my_state + tf.reduce_sum(inputs) creates a new tf.Tensor and assigns it to self.my_state. It does not modify the original tf.Tensor in place.

Step 2: `tf.Variable` vs. `tf.Tensor`

Feature	`tf.Tensor`	`tf.Variable`
Mutability	Immutable	Mutable
Purpose	Used to store data that does not change over time.	Used to store data that changes over time, such as the model’s weights.
Gradients	Not automatically tracked by the gradient tape.	Automatically tracked by the gradient tape.
Creation	`tf.constant()`, `tf.zeros()`, etc.	`tf.Variable()`
Updating	Cannot be updated.	Can be updated using the `assign`, `assign_add`, and `assign_sub` methods.

Step 3: Fix the Layer

To fix the layer, we need to use a tf.Variable to store the layer’s state. We also need to use the assign_add method to update the value of the variable in place.

import tensorflow as tf

class MyLayer(tf.keras.layers.Layer):
    def __init__(self):
        super(MyLayer, self).__init__()
        self.my_state = tf.Variable(0.0)

    def call(self, inputs):
        self.my_state.assign_add(tf.reduce_sum(inputs))
        return inputs

Now, when we run the test case, it will pass.

layer = MyLayer()
x = tf.constant([[1, 2, 3]], dtype=tf.float32)
y = layer(x)
print(layer.my_state.numpy()) # Expected: 6.0, Actual: 6.0

Systematic, production-ready debugging

Practice Question

You are building a custom optimizer and need to store the moving averages of the gradients. Which of the following would you use?

Questions